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PREFACE 


During the past quarter of a century the subject of Probability 
has acquired a new importance in science, partly because of the 
more recent stress on statistical laws in mechanics and partly 
because of the rapidly expanding use of statistical methods in 
medical, biological, engineering, industrial, and social problems. 

Writers have approached Probability from very diverse 
angles but little attempt has been made at any sort of unifica¬ 
tion. At times it is regarded as a branch of symbolic logic, 
sometimes as a series of empirical conclusions based on experi¬ 
mental practice. Certain writers see it as a branch of pure 
mathematics, others as a description of a state of mind. To 
some it is of philosophical, to others of scientific importance. 

The authors have taken the view that Probability is an essen¬ 
tial of scientific method, and that a probability estimate, how¬ 
ever it is approached, has to be seen and interpreted as a guide 
in scientific procedure. Thus these various treatments are in 
reality partial aspects of the same topic, where in each case the 
form of analysis has been decided by the particular scientific 
purpose for which the treatment has been attempted. 

The present book, claiming to }>e no more than an elementary 
treatment, makes no effort to cover all these fields. The earlier 
mathematical portions are restricted mainly to simple con¬ 
siderations of Mathematical Probability and its linkage with 
Statistics in a form suitable for non-mathematical students; 
hence the inclusion of the material of Chapters III and IV. At 
the same time the authors have striven to provide a detailed 
criticism of the various self-contained theories of probability 
that have been advanced from time to time. This has com¬ 
pelled them to embark on certain considerations of scientific 
method and, later in the book, on more advanced mathematical 
problems in Probability, without, however, entering into fields 
such as Statistics proper or other branches of physical science 
farther than has been essential for this purpose. 

While most of the examples are new, a number have been 
selected from Whitworth’s Choice and Chance and these the 
authors here gladly acknowledge. 


H.L. 
L. R. 
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CHAPTER 1 


HISTORICAL INTRODUCTION 

The theory of probability arises from a number of different 
sources. It already manifests itself in certain practices resem¬ 
bling insurance which were known to antiquity; thus, the 
Roman collegium or guild paid a sum of money to the surviving 
relatives upon the death of a member, a custom which was 
continued by the medieval guilds. In 324 b.c. a Greek named 
Antimenes devised the first system of insurance mentioned in 
history; he guaranteed owners against the loss of their slaves 
for a premium of 8 per cent, per annum. The marine insurance 
trade likewise originated in Greek times with the practice of 
bottomry or sea-loans; when a merchant sent a cargo abroad 
he received an agreed sum from a banker which he repaid with 
interest if the cargo arrived safely, but retained if it failed to 
do so. It seems clear that in such bargains the prevailing rate of 
interest was high. 

The early history of insurance does not appear yet to have 
been thoroughly explored; that of banking and exchange is, on 
the other hand, well documented. In the fifth century b.c. 
banks had already been established in Athens. We know that 
by the end of the thirteenth century the Italian and, more 
especially, the Florentine merchants dominated the entire trade 
of Europe, and that in 1350 they had banking establishments in 
most of the European capitals; their power was such that they 
were able to finance wars, control international exchanges, and 
dictate monetary policy at large. It may be added that at this 
time a regular rate of exchange began to be quoted in London 
between English and Flemish currency. 

Henceforward financial operations in Europe took on some¬ 
thing of their present-day character, including the deliberate 
policies of inflation and deflation with which we are only too 
familiar. In this connexion we may note the steps taken by 
Sir Thomas Gresham, in 1552-3, to restore fallen English credit 
by pegging the exchange, selling foreign currency in Antwerp, 
and placing restrictions upon the trade with Flanders. All these 
operations involved actuarial problems in probability, however 

4260 « 



2 HISTORICAL INTRODUCTION Chap. I 

rudimentary. The methods of insurance, which date, as we 
have seen, from very early Greek times, developed without any 
aid from the actuarial principles with which they are nowadays 
associated: these latter grew out of a different order of ideas, 
which we have now to consider. 

It is not until the Renaissance that the subject begins to 
re-emerge in a new setting. During the sixteenth and seven¬ 
teenth centuries a great deal of the leisure of the European 
aristocracy was occupied with games of chance and gambling 
in general. This class did not number among its members any 
mathematicians capable of handling the problems that naturally 
suggested themselves, but nevertheless it happened that from 
time to time problems of chance were passed on to the mathe¬ 
maticians of the period. Perhaps the only exception to this 
rule was Cardan, himself an inveterate gambler (notorious for 
his theft of Tartaglia’s solution of the cubic) who, somewhere 
about 1550, wrote a small gambler’s manual; the book was not, 
however, published until 1663. Galileo (1564-1642) had his 
attention directed by an Italian nobleman to a problem in dice, 
the solution of which is the first recorded result in the history of 
mathematical probability. 

The problem is as follows: Whereas when three dice are thrown the 
numbers 9 and 10 can each be obtained in 6 ways (different from each 
other), yet it is found from actual experience that 10 appears more often 
than 9. How can this be accounted for? In his work (which did not 
appear until 1718) Galileo makes an analysis of all possible cases and 
shows that, of 216 possible ways of throwing three dice, 27 are favourable 
to the 10 and 25 to the 9. Nowadays we should solve such a problem 
by the method of Chapter VII; it represents the first successful attempt 
to explain the frequency of appearance of certain groups of numbers by 
an analysis of the possibilities that might arise. 

Twelve years after Galileo’s death a correspondence began 
between Pascal and Fermat which gave the first real impetus 
to the theory. The Chevalier de M6r6, a French gentleman 
with mathematical interests, propounded certain questions to 
Pascal, who communicated them to Fermat. Of these the most 
important is the famous 'Problem of Points’ which in varying 
forms was to occupy a central place in the theory for the next 
century and a half. It was first enunciated by Pascal in 1654, as 
follows: Two players, with equal chances of winning a point, are 
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playing a game for three points. If they wish to break off the 
game before the end, how shall the stakes be divided? Pascal 
solves this problem and later enunciates without proof the 
results for a game for n -\-1 points in the case where one player 
has already n points and the other none, and where one player 
has one point and the other none. 

Fermat’s solution of the problem, given at the same time, is 
for the case where one player requires 2 points and the other 
3 points, to win; his method is essentially the same as that given 
later in Chapter V. Pascal applies this method to a similar 
problem in which there are three players. In the same year was 
printed his Traite du triangle, arithmetique , which is the earliest 
treatise on the theory of combinations, and contains, among 
other things, the familiar formula for the binomial coefficient 
n C r . Pascal uses the results of this work to solve the problem 
of points in the case where one player requires m points and the 
other n points to win. 

In all this we see that the setting of the problems is a 
gambler’s one, although both Pascal and Fermat are interested 
primarily in the mathematical analysis. In this connexion we 
may note a distinction between the progress of the theory in 
Catholic and Protestant countries; in the latter the interest was 
concentrated on quite different topics—thus, Newton, who was 
born the year Galileo died, seems hardly to have concerned 
himself with questions of this nature. Almost the sole exception 
was Huygens who in 1657 produced the first treatise on gaming 
and dicing problems. This remained the best account of proba¬ 
bility until the advent of James Bernoulli, Montmort, and De 
Moivre, all citizens—at any rate by birth—of countries in which 
gambling was not frowned upon, that is, in which the Catholic 
feudal aristocracy was not yet restricted by the rising Puritan 
class of burghers. 

To us the interesting feature of the development of proba¬ 
bility at this time is the fact that it began to be cultivated, 
apparently on a different basis, in England and Holland. These 
countries were Puritan because the burgher class, the towns¬ 
men, had already succeeded in asserting themselves; they were 
more interested in problems of trade exchange and questions 
related to the growth of town population. Thus in 1662 we find 
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Captain John Graunt devising a method for utilizing the weekly 
returns of deaths in the City of London to determine the growth 
of the capital, while in 1671 John De Witt published researches 
on the mathematics of annuities, in Holland. Halley the 
astronomer published a memoir in the Philosophical Transac¬ 
tions for 1693 based on the tables of births and deaths for the 
city of Breslau during the period 1687-91. He gives a table 
showing the numbers of the population aged n years, and shows 
how to find the value of an annuity on the life of a person of 
given age. He constructs a table of annuities for every fifth year 
of age up to 70 years; and he considers also the question of 
annuities on joint lives. 

From the end of the seventeenth century to the middle of the 
eighteenth century was one of the most fertile periods in the 
history of the purely mathematical theory. During this period 
James Bernoulli (1654-1705), Montmort (1678-1719), and 
De Moivre (1667-1754) between them developed the greater 
part of the elementary theory as it is known to-day, illustrating 
their work throughout by problems in games of chance, from 
which it originated. 

To James Bernoulli is due an extension of the problem of points; 
he obtains, substantially by present-day methods, the probability of 
throwing a given number with n dice; and he solves the problem of the 
‘duration of play’, that is, of finding the probability that a player should 
win all his opponent’s money, given the players’ initial capital and their 
respective chances of winning a point. But his remarkable contribution 
to the theory is the theorem known by his name (pp. 58-60), the second 
part of which consists of an approximation to a probability by purely 
algebraic methods. 

The work of Montmort goes over the familiar ground of dice 
and card problems; in addition it comprises valuable additions 
to the theory of permutations and derangements, including the 
solution of the ‘problem of treize’ (p. 97), and contains the 
elements of finite differences and the theory of recurrence 
relations. Many of these results were arrived at independently 
by De Moivre to whom, moreover, are due the formulae for the 
chance of throwing a given number with an n-faced die, and that 
of an event succeeding consecutively a given number of times. 
To De Moivre is due the idea of approximating to probability 
formulae by means of logarithms ; in this connexion he discusses 
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the approximation to the value of the binomial coefficients 
occurring in Bernoulli’s Theorem, and gives a formula which is 
practically equivalent to Stirling’s Theorem (p. 67); it would 
appear that this theorem had been discovered at about the same 
time by Stirling himself. 

We thus see that, in the effort to discover new mathematical 
methods to handle problems in probability, there emerged a 
great deal of work on permutations and combinations, finite 
differences, recurring series, the idea of summation of infinite 
series, and many new trigonometrical formulae. These were 
still in the main a continuation of developments in the Latin 
countries; the problems dealt with were those that arose from 
the way of living of the aristocracy. But a new period was 
setting in, one of criticism and examination preparatory to the 
French Revolution of half a century later. We can observe the 
beginnings of this phase in the controversies that arose between 
Leibniz and James Bernoulli; the latter had attempted, by 
inverting his theorem on the probability of occurrence of a 
group of events, to determine the probability of the event 
itself. Thus what later became a major issue, the ‘probability of 
causes’, was raised in mathematical and philosophical form for 
the first time. 

Meanwhile, under the influence of the work of English experi¬ 
mentalists, mathematical physicists, and astronomers, the same 
problem arose in a new form, one associated with what is called 
the ‘theory of errors’, the reasons that can be adduced to 
explain why sets of observations of the same measured quantity 
are always, to some extent, discordant among themselves. This 
was a problem of theoretical science, arising from the needs of 
experimental practice, and it was one that was certain to 
intrigue natural philosophers studying scientific laws from a 
mechanistic standpoint. 

From the scientific point of view Thomas Simpson, in his 
Miscellaneous Tracts (1757), was the first to examine critically 
the implications of taking the mean of a set of astronomical 
observations of the same event. Thus this theory, now an 
integral part of the subject of the significance of errors, owes 
its origin to astronomical needs. Naturally, the French ex¬ 
perimentalists were by now equally concerned with the same 
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problem. In 1770 Lagrange published his memoir on the method 
of taking the best value from among a series of observations. 

This work, which had in part been anticipated by Simpson, discusses 
the probability that the error of the mean of n observations should lie 
within assigned limits, and determines the most probable error of the 
mean. Again, if it is known that the errors in a set of observations 
must be one of the numbers ±1, ±2,..., ±m, and that the chances of 
these errors are equal, or proportional to given quantities, Lagrange 
shows how to determine the probability that the error of the mean 
should have an assigned value or lie within given limits. 

All these results are obtained by expansion of multinomial 
expressions and other purely algebraic processes; but at the 
same time a new conception was introduced by Simpson and 
Lagrange which proved later to be exceedingly fertile in analysis 
—the idea of an error curve . For reasons to be explained in this 
book, ‘errors’ or divergences from the ‘true’ value necessarily 
consist of a discontinuous set of data; but apart from the 
calculus of finite differences, which was still a comparatively 
new and little known subject, the whole field of mathematics 
concerned itself with ‘continuous’ phenomena. Thus, in the 
face of mathematical limitations, the facts regarding the nature 
of error were altered to suit, and both Simpson and Lagrange 
introduced the notion of continuous variation in error. The 
analogy did not proceed very far; but nevertheless, the concept 
of errors in a continuum x with a probability function <f>(x) had 
now found its place. 

In 1778 Daniel Bernoulli published a memoir on errors of observa¬ 
tions, in which he remarks that the common method of treating dis¬ 
cordant observations, by assuming that the true observation is the mean, 
presupposes that they are of equal weight, whereas small errors are 
surely more probable than large ones. Bernoulli therefore proposes to 
measure the probability of an error x by the number J(r 2 —x 2 ), where 
r is a constant; then the best value x to be obtained from a set of 
observations x lf a; 2 ,..., x n will be that which makes the product 
J{r t -(x l --x) % }J{r 2 --(x t —x) 2 }'.. a maximum. In effect Bernoulli thus 
assumes the probability curve to be a circle and applies to it the method 
of inverse probability (p. 164). 

The idea of continuity in connexion with probability shows itself in 
other researches of Daniel Bernoulli, in which his purpose is to demon¬ 
strate the use of the differential calculus. For example, he discusses 
the probable distribution of liquid in three urns, initially containing 
different liquids, if for a time t liquid is allowed to flow from the first 
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to the second, from the second to the third, and from the third to the 
first. We may also note here the work of Buff on who in 1777 applied 
the notion of probability to geometrical problems; thus, if a coin is 
thrown on a table ruled in squares or equilateral triangles, it is required 
to find the probability that it will fall clear of the bounding lines. 
Buff on’s most famous problem (p. 86), requiring the use of integral 
calculus for its solution, is found in the same work. It is of interest to 
note that the result has several times been used to calculate experi¬ 
mentally the value of tt with, however, suspiciously good results. 

Th e criti cal work of the French Encyclop6dfetes, to which we 
have already alluded, did not proceed far, conducted as it was 
by individuals who were for the most part non-mathematicians 
and who failed therefore to distinguish between those considera¬ 
tions which are mathematically and those which are socially 
important. Even a distinguished mathematician like D’Alem¬ 
bert, who directed his criticism at the fundamental definitions 
in probability theory, succeeded only in arriving at the most 
preposterous conclusions. The Marquis de Condorcet dealt 
with such questions as the probability of election of a candidate 
by a given number of voters, and the probability of a tribunal 
arriving at a true verdict in a trial^ In view of his faith in the 
necessary progress of the human race towards happiness and 
perfection, it is one of the ironies of history that he himself was 
condemned by the revolutionary tribunal. 

It is during this period that the problem of ‘inverse proba¬ 
bility’, first considered by James Bernoulli, again shows itself, 
in two posthumous memoirs by Bayes which appeared in the 
Philosophical Transactions for 1764-5. Bayes gives, in geometri¬ 
cal form, the theorem that, if an event has happened times and 
failed q times, the probability that the chance of success will 
lie between the values a and b (all values being equally likely) is 

b i i 

J x p (l — x) q dx / J x p (l —x) q dx. Bayes then proceeds to evaluate 

these integrals by approximation. It would be interesting to 
discover whether the investigations of Euler and Legendre on 

l 

the Beta function J x p (l—x) q dx, which began shortly after 

1770, were suggested by the work of Bayes. For us, however, its 
importance lies in the evidence it affords of the convergence of 
the subject-matter treated in England towards that of France 
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on the threshold of the Revolution: this, of course, is only a 
slight aspect. Bayes, himself a clergyman living in the middle 
of the eighteenth century, turned his attention to these ques¬ 
tions, directly or indirectly, under the influence of a sceptic like 
Hume (1711-76) or an idealist like Berkeley (1685-1753). These 
latter were themselves working on the ideas of Locke (1632- 
1704) and Hobbes (1588-1679). Hume, we know, made frequent 
contact with, and was much influenced by French writers; thus 
it was in this atmosphere that Bayes attempted to state in 
symbolical form the relation between cause and effect as it 
shows itself in probability. It is worth recollecting that, diverse 
as their outlooks may be on other matters, Locke, Berkeley, and 
Hume are at one in their distrust of mathematical reasoning 
and tend to rely on probability rather than on certainty. 

If any single person has to be accorded the merit of syn¬ 
thesizing the development of the subject at this stage, that 
person is Laplace (1749-1827) who, living and working through¬ 
out the revolutionary period, drew together the theoretical and 
philosophical conclusions which had emerged from the problems 
of gaming on the one hand, and from the discussion of experi¬ 
mental errors, on the other. In addition Laplace established the 
connexion between these and the corresponding questions in 
mortality and life tables which lie at the basis of insurance 
statistics. It is here also that the first specific statement of the 
Error Function is formulated; and although it was later dis¬ 
covered independently by Gauss (18()9) we can accept the view 
that all the essentials of probability theory and most of the 
deductions from it are contained in Laplace’s great synthesis. 
From this time onwards it was inevitable that developments in 
any one of the fields—philosophical, logical, mathematical and 
experimental, industrial, financial, actuarial and statistical— 
were bound to affect each other and to grow from the same 
broad principles. One of these principles, established by La¬ 
place, is the method of Least Squares, which he deduces from a 
set of very general assumptions. He shows, in fact, that if we 
suppose the mean of a set of observations to be the most prob¬ 
able value, and positive errors to be as likely as negative ones, 

the error function for the observations is of the form e^ %x \ 

Vtt 
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Actually, the method of Least Squares had previously been 
used in astronomy by Euler and Gauss, but Gauss was the first 
who endeavoured to justify it by an appeal to probability 
theory. 

The beginning of the nineteenth century marked a change of 
profouncf importance, if not in mathematical methods, at least 
in the subjects to which these methods were applied. The 
Industrial Revolution had already set in, with its modem 
problems of factory production and increasing populations; 
from these emerged a vast array of social problems which in 
response to a slowly developing public conscience were becom¬ 
ing the subject of closer and more refined statistical investiga¬ 
tion. Thus 1801 saw the initiation of the English population 
census. A short time later the growing Trades Union movement 
began to maintain a continuous index of unemployment figures 
among its members. Simultaneously, under the drive of indus¬ 
trial needs, and with the funds allotted in universities and else¬ 
where to experimental studies, scientific investigation proceeded 
apace and with it a whole range of new problems emerged. 

In a sense science was, however, largely in the engineering 
phase, and while questions of experimental error were still dis¬ 
cussed, the scientific outlook was highly mechanistic, with little 
regard for any consideration of statistical qualities in Nature. 
But the Industrial Revolution, which brought about an immense 
increase in production, was one of the driving forces towards 
foreign trade; here, then, on the side of insurance a new impetus 
was given to the development of the subject, in a field where 
mechanism had no place and average changes were the qualities 
that required study. We therefore find during this period a 
development of those methods of a statistical nature which are 
required in commercial expansion and social investigation. 

Nevertheless, experimental work was proceeding on chemical 
and physical principles; in particular, interest was focused on 
the characteristics of gases and gas mixtures, and the pressure 
laws governing them^possibly under the influence of the new 
uses for illumination to which inflammable gas was being put). 
As e&rly as 1660 Boyle had discovered his gas law from entirely 
experimental considerations; the id§a that a gas, impingeing 
on an obstacle, consists of individual particles, and that 
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the pressure it exerts results from this mutual impact, had 
been noted many centuries before, and various unsuccessful 
applications of the idea had already been made by Newton. In 
1738 Daniel Bernoulli showed that Boyle’s Law follows from 
the hypothesis that the gas consists of a large number of moving 
particles, and that the pressure arises simply from that exerted 
by the gas on the walls of the containing vessel. So the position 
remained until the turn of the century when, as we have indi¬ 
cated, attention was drawn to the properties of gas mixtures. 
Thus, in 1802, Dalton enunciated his law for the pressure of 
gas mixtures, basing it on the tacit assumption that the motion 
of all the particles involved was uniform. By the middle of the 
century Clausius (1847) and Joule and Kronig (1857) had shown 
how to express the pressure in terms of the mean velocity of 
the gas particles. 

Meanwhile, the philosophic problems associated with proba¬ 
bility, which had emerged from the writings of the Encyclo- 
p^distes, were being examined and extended by De Morgan, 
Venn, Boole, and others. The law of Laplace-Gauss was well 
accepted as the necessary distribution function for a combina¬ 
tion of ‘random’ factors. By 1860 Maxwell was therefore in a 
position to apply these ideas to the random motions of gas 
molecules, and from this there rapidly developed an elaborate 
statistical theory of gases. 

We should note that this marks a culminating point in the 
theoretical development, in the sense that we have presented 
a new class of problem in scientific method. For, by his 
analysis, Maxwell showed how the characteristics of a large 
mass and the laws exhibited by it in various circumstances are 
related to the corresponding characteristics of particles at a 
‘lower’ level. Although since that date many fruitful develop¬ 
ments of Maxwell’s theory have occurred, the next stage in its 
application was not until the beginning of the twentieth century, 
when the experimental discovery of still more elementary forms 
of matter (electrons, protons, neutrons) threw up a similar type 
of problem for study: namely, how to express the character¬ 
istics of the atom or molecule in terms of the more elementary 
characteristics of the electron, proton, etc., on the assumption 
that these show themselves as the result of statistical combina- 
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tion. We may put it shortly by saying that the step from 
Newtonian theory for the motion of a body to Maxwell’s theory 
for the characteristics of a gas is similar in type to the step from 
atomic characteristics to the quantum theory. 

It remains to point out that there exist at the present day 
groups of investigations of a statistical nature arising from 
insurance, actuarial analysis; industrial statistics and their 
application to production and distribution; the statistics of new 
social problems and the statistical approach to questions in 
purely scientific inquiry, including genetics, quantum mechanics 
and mathematical logic, and these, admittedly requiring 
specific treatment, are usually dealt with as if they were 
separate and distinct fields. All these developments require a 
new unification and synthesis, such as was performed by Laplace 
in his day; the efforts that have been made to this end, merely 
by the production of a theory of probability as an extended 
branch of logic instead of as an actual and vital part of scientific 
process, must, when seen in perspective with this historical 
movement, fail in their function. That unification has yet to 
be found.- 



CHAPTER II 


THE SCOPE OF PROBABILITY 

1 • The meaning of chance 

All events in the universe are interrelated and affect each other 
to a greater or less degree; for example, the reader of this book 
will be affected by all the factors which brought the book into 
existence, and these range from the manufacture of paper and 
ink on the one hand, to the history of the authors, their parents 
and teachers, on the other. Thus all events have an enormous 
number of causes, some more important than others. It follows 
that, in any attempt to obtain information about them, some 
selective principle is necessary in order to eliminate what we 
suppose will turn out to be the less relevant facts in a particular 
case; indeed, by the mere use of the word ‘event’, we are focus¬ 
ing our attention on the thing that interests us, all other things 
being for the moment irrelevant. 

Science is concerned with particular kinds of events which 
interest us. The procedure which characterizes scientific method 
consists in isolating rational sequences of events, that is, events 
which appear to form a logical chain when interpreted in the 
light of certain fundamental assumptions. Thus, a ball is pro¬ 
jected into the air with a specified speed: it rises to a certain 
height and reaches the ground at a certain distance from the 
point of projection. A scientific study of this projectile attempts 
to connect this sequence of events so that one or more of them 
follow as a logical conclusion from the others. For this purpose, 
in the first place we ignore all other events except these, e.g., 
we ignore the temperature of the atmosphere, the possible 
defects in the apparatus used for the projection, and the 
personal views of the experimenter; and in the second we assume 
the operation of some guiding principle, frequently described as 
a ‘law of force’. Such a problem belongs to the science of 
rational mechanics , which by postulating laws of force purports 
to deduce mathematically the effect of a given system of forces 
acting on a given system of bodies. In other words, one of the 
aims of mechanics, as of any other branch of soience, is pre¬ 
diction. (What interests us, in a sequence of events, is the way 
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in which they can be grouped together to facilitate prediction 
and thus effect control over Nature.) The accuracy of the 
prediction will consequently depend not only on the selection 
of events, but also on the guiding principle which we were led 
to make in formulating our science. We are not justified in the 
first instance in assuming that this process will lead to results 
which agree with observed facts; if, however, we wish to 
sharpen the accuracy of our prediction, it is clear that we can 
do so by making a study of those events which we rejected pre¬ 
viously as being less relevant to the problem. This might also 
necessitate a change in our guiding principle. This sharpening 
process may be repeated again and again. At any stage we define 
the difference between the event predicted and the actual observed 
occurrence as a chance effect. While one field of science, which 
we have called a rational system, occupies itself with predic¬ 
tions which involuntarily exclude or ignore these chance 
differences, another field takes them as its object of study, 
under the name of ‘deviation’ or ‘experimental error’. It is 
with this that the calculus of probability is concerned in its 
application to experimental practice. 

For certain purposes of analysis, a guiding principle is here 
again frequently assumed, when ‘chance’ is conceived as itself 
the result of a large number of equal elementary causes com¬ 
bined together. In the examination of this theory, points are 
often illustrated by models and analogies such as those dealing 
with balls chosen from urns, each such choice being thus re¬ 
garded as a simple, elementary event. 

We must begin our study with a word of warning. The 
abstract theory of probability, which seeks to comprehend those 
facts which elude the ordinary rational systems, must itself 
of necessity be a rational system, working by mathematical 
methods and based on certain assumptions. So it frequently 
happens that problems which appear to be about physically 
real things, such as balls extracted from an urn or a coin tossed 
in the air, have nothing specifically ‘real’ about them, in rela¬ 
tion to balls and urns: they are simply abstractions fitted into 
a picture to assist the mathematician. The justification for 
using such abstractions in our problems cannot rest finally on 
Any theoretical basis alone, but in the last analysis has to be 
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found from the experimenter before or after the abstractions 
have been applied. In any case we must distinguish between the 
mathematical problem of choosing a mathematical ball from a 
mathematical urn—an imaginary problem—and the actual urn, 
the balls contained in it, and the actual process of choice. The 
former may guide us in analysing the latter. 

An example will make the need for this distinction clear. 
If 100 persons each have to choose a number between 0 and 9 
inclusive, how often will the numbers 0, 1 , 2 ,... be chosen? 
The abstraction which a mathematician might make from this 
problem would leave him with a purely mathematical question 
concerning arrangements, the answer to which is, that each of 
the numbers will ‘probably’ be chosen ten times. But this is 
not the real question; what we want to know is how people 
actually choose , and here we are faced by considerations of a 
psychological and social nature. In point of fact it has been found 
by actual testing of a large number of individuals that 7 and 3 
are much more frequently chosen than any other number; these 
numbers both, of course, have a long historical and religious 
tradition behind them. As we see from such an example, the 
question whether the abstraction may be validly applied in a 
given case is not to be begged. The mathematical problem deals 
with the number of arrangeme” .at can be conceived as 
possible in the circumstances, the physical problem with the 
groups of these which actually come into play. We can develop 
a mathematical theory of arrangements but a separate justifica¬ 
tion has to be found for it if it is to have practical applications. 
Thus, the mathematician may postulate that ‘an event can 
happen in two different ways’; whereas the physicist knows 
that it does happen in one way only. 

In the above problem we recognize two questions inherent in 
the theory of probability: a mathematical question concerning 
possible arrangements, and a physical question concerning 
actual choice or action. There is also a third kind of problem 
which we now consider. Most human beings, even if they are 
not scientists, analyse events in a rational way, that is, they 
recognize order and recurrence and are so led to develop a sense 
of expectation as a subjective reaction. If we study a person 
scientifically we may ask whether his expectation of an event 
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is justifiable, that is, whether his past experience is sufficient to 
produce the expectation that would correspond to the reality 
which the future will bring forth. For instance, if one wakes up 
in the morning and hears a cart rattling in the street, there 
comes the thought, ‘I expect that is the milkman’, or else ‘It is 
probably the milkman*. Obviously it is either the milkman or 
it is not, and it is one’s past experience, in which one’s expecta¬ 
tion has sometimes been verified and sometimes not, that deter¬ 
mines the strength of the expectation. Whether that expecta¬ 
tion will be verified or not will depend on how far our psycho¬ 
logical reactions conform closely to the underlying processes of 
the external world. We see, then, that such a question is not to 
be decided by a study of all the possible arrangements which the 
future may conceivably bring forth: we cannot thus be sure, 
without elaborate investigation, that psychological expectation 
is itself a sure guide to future occurrence. 

To sum up: in our analysis of situations relevant to ‘proba¬ 
bility’ we have discovered three possible fields of study, all in 
some way interrelated and each a partial approach to the general 
problem: 

(1) a mathematical theory of arrangements; 

(2) the frequency of actual occurrences; 

(3) the psychological c ^tion of a participant. 

Problem (2) is the one w 4 cn arises in actual practice, when 

in describing the course of past events we attempt to predict 
the future: in this respect it does not differ from every other 
experiment, which is always concerned with the past as a guide 
to the future. Problem (1) is a mathematical discussion of 
abstractions which may be useful in (2) if they are shown to be 
relevant; while (3) represents the subjective state of a person 
who possibly makes a rough use of (1) and (2) when he is faced 
with the events in (2). 

In (1) the conception and practice of chance do not occur: 
every problem must be precisely defined and has a precise 
answer. For example, we may ask, out of a pack of 52 cards, 
what proportion of all possible groups of 13 will contain 4 aces ? 
Here no question of chance arises. In such a problem the exact 
number of cards, and the kind of hand, are specified: there are 
no ambiguities in the situation—the 52 cards and the 4 aces 
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are isolated in an abstract way from all the rest of the universe: 
in short, they are given. Any actual process of selection is 
deemed irrelevant, and the answer is unique. Precisely the 
same situation arises with a geometrical problem: thus, we are 
given a triangle with certain properties and we proceed to 
deduce certain consequences. On the other hand, chance, as we 
have defined it, enters into (2), and again in (3), since the 
individual concerned makes his own analysis which is necessarily 
partial; but what is chance to him need not be chance to the 
scientist engaged with problem (2). 

Chance in Scientific Observation 

A scientific observation depends not only on instruments but 
on the circumstances in which they are used—for example, the 
individual who performs the experiment, the temperature of 
the laboratory, and so on. Hence the results depend, to some ex¬ 
tent, on the differences between individuals. The object of all 
scientific experiment is to obtain objective information about 
the world: by objective information we mean information that 
can be stated in a form independent of the particular experi¬ 
menter and his idiosyncrasies. We call this information in¬ 
variant to the individual. 

Suppose that we wish to measure the length of a desk: what¬ 
ever definition of ‘length* we may adopt, if it is to be of any use 
for scientific purposes it must be invariant to the observer. But 
one observer applies a measuring rod to the desk and finds that 
it records 25*1 inches, another finds instead the reading 25*2 
inches, a third 24*9 inches, etc. What then is the length of the 
desk ? At the end of such a series of observations a scientist has 
in his possession a set of numbers , which represent all the 
measurable information that he can obtain for his purpose. He 
has then to say to which, if any, of his numbers the term ‘length* 
will be applied; the differences between the selected number 
(the ‘length’) and the rest he assigns to ‘chance*. They are 
presumably due, among other things, to the observer who, so 
far as an invariantive statement is concerned, is a chance one, 
irrelevant to the issue. The chance differences are said to be 
‘errors of observation’; but in effect such a term is simply a 
means of grouping together all that remains after the rational 
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abstraction, which has been called ‘length’, has been made. In 
this way the idea of chance becomes identified with the cause 
of so-called ‘experimental errors’: the one implies the other. 
The definition of length really specifies the method of isolating 
the experiment from the rest of the universe in an attempt to 
obtain objective information and to build up a logic of science: 
the ‘errors’ represent the real connexion (or part of it) between 
the isolate and its residue with respect to the universe. 

From the illustrations we have given it will be observed that 
the difference between the mathematical and the physical 
approach to a problem is that, whereas in the former the field 
of discourse is defined in advance, in the latter the primary 
object of our inquiry is to find it. A physicist who is studying 
the properties of matter discovers that it can be broken down 
into electrified particles; thus he has now found a field of 
investigation. The mathematician can now begin his analysis 
with the statement: Given two isolated electrified particles inter¬ 
acting in a given way, can their future behaviour be predicted ? 
Such behaviour can then suggest a new field of investigation 
to the experimenter who, unlike the mathematician, is never 
‘given’ two isolated electrified particles. 

Thus in the one case a mathematical field is postulated and 
we examine its logical implications: in the physical problem 
the make-up of the world itself is the unknown, and the object 
is to discover what in fact is its structure. In practice, however, 
both physicists and mathematicians work hand-in-hand and 
supplement each other, as shown in the above example. The 
subject of probability, therefore, to be complete, has to play 
its part in both fields; the mathematician has to forge an instru¬ 
ment which the experimenter can use in practice. 

2. On the definition of probability 

Definition of Mathematical Probability 

We propose in the first instance to define ‘probability’ in a 
purely mathematical sense, that is, in connexion with problem 
(1). The definition we give is the following:! 

‘If there is a group of N letters consisting of n x letters a 1 , n 2 
letters a 2 ,and n r letters a n the probability of a letter specified 

t See also Peano, Rend. Accad. Lincei (5), 21 (1912)!, 429. 

. 0 
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as belonging to the whole class a v a 2 ,..., a r being a letter 
a 8 is nJN. 9 

Having posed this definition we may legitimately ask whether 
it may be applied in a particular problem, that is, whether the 
definition of probability has any relevance to an actual experi¬ 
mental case. For instance, if a penny is tossed, what is the 
probability of a head? We can construct a model problem 
which we imagine to be like it by making correspond the word 
‘head’ to the letter a x and the word ‘tail’ to the letter a 2 , whence 
we obtain a mathematical solution; we replace the real penny 
and the action of tossing by two arrangements which we may 
call either ‘head and tail’ or ‘a x and a 2 ’: in this way the actual 
penny no longer concerns the mathematician. 

But this gives us no definition of probability of a physical 
event, such as the tossing of a coin: our knowledge of such an 
event implies knowledge of the circumstances in which the coin 
is tossed. An experimenter who studies the problem might ask 
himself how frequently the head appears : he might study the 
detailed process of tossing but whether he is entitled to make a 
precise prediction on that is another matter. An onlooker might, 
if interrogated, reply that heads and tails are equally likely : his 
answer emerges from a collective experience, a result of having 
seen actual pennies spun. To bring the term ‘equally likely’ into a 
mathematical definition would be to confuse (3) with (1), just as 
to use the term ‘equally frequent’ would be to confuse (2) with (1). 

Definition of Statistical Probability 

We define a class of event by a distinguishing quality of that 
class, e.g. the event known as traffic accidents; these grow in 
number with time and will be referred to as a population of 
traffic accidents. At any given moment the ratio of fatal cases 
to the total number has a certain value which itself in general 
varies with time; this ratio we call the statistical probability of 
fatal accidents. Its importance lies in the practical fact that 
it is used either as a guide to prediction concerning the number 
of such cases in the future, or as a factor in determining how 
we can attempt to diminish them. 

We note two points of difference between this definition and 
the preceding. In the latter the population of events whose 
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arrangements we considered was in general finite and deter¬ 
mined, and the probability of any subclass was a Inatter for 
deduction. In the case of statistical probability, both the popu¬ 
lation and the subclass, although defined in Nature, are not 
initially bounded in extent and, in fact, they grow with time. 
The significance of the probability lies in its application to 
future members of this growing class; thus the application is 
essentially of an inductive nature. This is not to say that the 
two methods of approach have nothing in common; when we 
come to discuss what is called the significance of a statistical 
probability it will be found that the mathematical definition 
affords us an idealized standard against which the significance 
may be measured. Statistical probability finds its application in 
many branches of insurance, in the analysis of demographies! 
statistics, and plays a part in such natural phenomena as 
meteorology, where the deductive^ methods of physical science 
are not yet sufficiently precise to enable satisfactory predictions 
to be otherwise made. 

A priori Probability 

There is a form of statistical probability which appears in 
the literature of the subject under the name of a priori prob¬ 
ability. Let us suppose, for example, that we are examining 
the probability of an individual being killed by traffic in the 
streets of a busy town. Although the actual data from which 
the statistical probability curve could be drawn are not avail¬ 
able, it is nevertheless possible from general considerations 
based on our knowledge of the circumstances and the impres¬ 
sions we have gained from others’ experience, to construct a 
probability curve which will at any rate serve as a first approxi¬ 
mation to the truth. Thus we know that between 8 a.m. and 
10 a.m. many people are in the street on their way to work, and 
that between 4 p.m. and 6 p.m. they are returning home. 
Moreover, we may expect that the ordinary traffic of the day 
is also augmented during those periods by the cars belonging 
to business men. Accordingly, most people would agree in pro¬ 
ducing a curve like that on the following page. From this we 
i can determine an a priori probability; its significance lies in the 
\fact that, if we wish to use it, it gives us a first criterion for 
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judging whether a batch of fatal accidents occurring, say, 
between 2 and 3 in the afternoon can be regarded as normal 
or not. Thus it enables us to make a first rough estimate of the 
probability of obtaining such a sample. 
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It is, of course, admitted that the details of the probability 
curve in the figure will vary with the person who constructs it, 
but there are cases in which no such differences arise. For 
example, if a penny is tossed, all will agree that on the basis of 
past experience the a priori probability of obtaining a head is 
This does not rest solely on the mathematical ground that a 
penny has a head and a tail, but on the additional fact that 
pennies do indeed, on the average, fall with equal frequency on 
head and on tail. If the general experience of tossing coins were 
sufficiently exact and had shown that in fact heads appeared 
51 times in a 100, the a priori probability would be accepted as 
i 5 w- We shall see later, when dealing with Bernoulli’s Theorem 
on the mathematical probability of obtaining certain propor¬ 
tions in a given sample, that a knowledge of the proportions 
in the original population is essential for the solution. There 
we shall refer to it as the probability of an individual member 
of that population ; but in applying the conclusions to samples 
drawn from it we must bear in mind that the probability in 
question is merely a precise form of the a priori probability 
which we have been considering here. 

Probability as a Branch of Logic 

The subject of probability is approached by many writers from 
a different angle, viz. as an extension of a branch of logic. A set 
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of facts, called the ‘data’, are stated, and a proposition referring 
to them is set alongside them; among the numerous relations 
that might be stated between the proposition and the data we 
consider one type in particular. While we usually assert that the 
proposition is either a true or a false statement about the data 
(it is certainly true if the data imply the proposition), an inter¬ 
mediate state may be considered. A class of 30 children may be 
the data and the proposition, ‘All these children have brown 
eyes\ In this illustration the restricted form of the data tells us 
nothing about the children’s eyes: the proposition is therefore 
not implied by the data, but is nevertheless not inconsistent with 
them. If further information were available it is possible that 
the proposition might be true; but as it stands, it outstrips the 
data. When such a situation arises it is said that the proposition 
has a ‘probability relation’ with respect to the data; the probabi¬ 
lity relation is then regarded as a member of a class of relations, 
the extremes of which are ‘true’ and ‘false’. We may say that 

‘A proposition is true’, or 

‘A proposition has a probability*, or 

‘A proposition is false*. 

It will be noticed that this approach to probability suggests 
that it is primarily psychological; if it were purely logical there 
would be no escape from the position that the proposition is 
either implied or not implied by the data. It is when the pro¬ 
position and the data are not thus rigorously bound together 
that the psychological attitude enters into the question. We 
feel that although the implication is not logically complete, 
nevertheless if further data were available the proposition would 
be found to be true. Thus the probability relation implies that 
when the proposition is used for enlarging the data it may be 
found to be true; this views the probability relationship as a 
step towards the accumulation of further data and the final 
establishment of a truth or falsehood: otherwise it remains 
artificially separated from its function. 

Consider the above illustration: to say that there is a proba¬ 
bility that the 30 children all have brown eyes is futile unless 
we go on to discover whether they have, or what proportion of 
them have brown eyes. When this step has been taken, the final 
data imply the truth or falsehood of the original proposition. 
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This interpretation of a probability relation gives it a value 
in scientific method. If, however, we attempt to give it a value 
in itself by isolating it from the necessary part that it should 
play in scientific method, the subject may be indeed developed 
further but necessarily not on the present lines. To appreciate 
this fact we must return to the concept of expectation: given a 
set of data and a proposition which outstrips them, each 
individual, on the basis of his past experience, has a sense of 
expectation that, if further data were accumulated, the proposi¬ 
tion would be verified. A group of experimenters of wide experi¬ 
ence in the particular field, i.e. on the basis of previous data 
not here specified, would presumably agree that they strongly 
suspected or rather expected the proposition to be true, or 
thought it might be true. They may thus find themselves agree¬ 
ing that a gradation in the sense of expectation is associated 
in their minds with the possible truth of a proposition. To 
proceed further along scientific lines some objective measure of 
expectation must be found, otherwise the theory as so con¬ 
stituted cannot come within the range of physical science. It 
is possible that the expert psychologist might find such a 
measure, by examining the reactions of the experimenters, but 
not directly from the data. A statistician might find such a 
measure, but he would derive it from the data alone, and not 
from the experimenters’ sense of expectation. 

An attempt to overcome the difficulty respecting the non- 
metrical nature of probability when approached in this way 
has been made by laying down the following axioms :f 

*1. If we have two sets of data p and p\ and two propositions 
q and q ', and we consider the probabilities of q given p , and of 
q ' given p', then ... the probability of q given p is either greater 
than, equal to, or less than that of q ' given p\ 

2. All propositions impossible on the data have the same 
probability, which is not greater than any other probability; 
and all propositions certain on the data have the same proba¬ 
bility, which is not less than any other probability.’ 

t Jeffreys, Scientific Inference , ch. ii. A very similar artifice is adopted by 
F. P. Ramsey (Foundations of Mathematics, p. 158) but he retains a subjective 
criterion for the strength of a belief, so that his symbols have an entirely 
personal reference. See footnote, p. 27. 
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Here by the phraseology used the sense of expectation has 
been given a status that is invariant to the individual and 
is attached to the objective situation; at the same time it is 
implied that this psychological probability is measurable. To 
circumvent such difficulties, what amounts to a verbal artifice 
has been adopted. The gradations in psychological expectation 
are identified with the real numbers, using instead of the word 
‘truth’ the word one, and writing it as 1; and using instead of 
the word ‘falsehood’ the word zero, and writing it as 0. By 
the use of this verbal method with the foregoing axioms all 
probabilities of this nature apparently become measurable by 
numbers lying between 0 and 1: thereafter it is a simple matter 
to derive the ordinary formulae for mathematical probability 
by setting out a series of theorems, such as: 

‘If several propositions are mutually contradictory on the 
data, the number attached to the probability that some one 
of them is true shall be the sum of those attached to the proba¬ 
bilities that each separately is true.’ 

In this treatment the idea of psychological probability has 
been transformed merely by use of an analogous terminology 
into mathematical probability; the fact that psychological 
probabilities have been stated as numbers, which are additive 
and range between 0 and 1, would, if these statements were 
true, imply an elaborately detailed knowledge of psychological 
processes and their measurable qualities. In point of fact, of 
course, no such data are available. It follows that, after these 
assumptions have been made, the subsequent treatment of the 
subject cannot differ in essentials from that of ordinary mathe¬ 
matical probability; although the fact that it is artificially 
based on psychological ideas may have the effect of confusing 
the later interpretations. If it is necessary at all to emphasize 
the gravity of the assumption that psychological probability 
is measured by numbers lying between 0 and 1, it is, for 
example, sufficient to point out that one could equally well 
arrange that ‘truth’ should correspond to the colour blue, and 
‘falsehood’ to red, all intermediate colours in the spectrum 
being assumed to correspond, somehow or other, to intermediate 
states of feeling. Such an arrangement would imply the 
same type of fallacy even though, as it stands, it does not 
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immediately involve assumptions of measurability, but merely 
those of correspondence. 

At this point it is desirable to add that, in refuting a purely 
psychological approach to probability, we are far from denying 
that that line of development is necessary. We have already 
said that the concept of probability itself marks a useful stage in 
scientific method—‘useful’ in the sense that it suggests the 
direction in which to seek and interpret data; it is the stage 
intermediate between partial ignorance and experimentally 
sufficient certainty. 

The Principle of Insufficient Reason 

In this connexion it is worth considering a method which 
various writers have evolved in order to arrive at an estimate of 
the a priori probability. It is commonly stated that if there is 
insufficient evidence to justify a probability assertion, the latter 
can be established by referring it to the ‘principle of insufficient 
reason’. Let us quote Jeffreys! on the subject: 

‘How do we assess the probability of a proposition before we 
have any means of knowing whether it is true or false ? It has 
often been said that assessing a probability implies some 
knowledge, and that therefore we cannot assign a probability 
when we are in complete ignorance. This opinion must be 
directly contradicted. Complete ignorance is a state of know¬ 
ledge . . . and the probabilities assigned upon it are perfectly 
definite. If we have no means of choosing between alternatives, 
the probabilities attached to those alternatives are equal.’ 

To adopt this standpoint is to deny the whole basis of science. 
Science is based on knowledge, if only partial, and nothing 
whatsoever can be built on ignorance: without data no conclu¬ 
sion can be drawn. If the fundamental question of our subject 
can be stated in the form, ‘Given certain data in a given situa¬ 
tion, what precise deduction can be drawn from them?’ then 
the problem of drawing a deduction from no data does not 
fall within its scope. If we are in complete ignorance about an 
event, then we are in complete ignorance of how to estimate its 
probability. In this case the principle of insufficient reason 
asserts that the probability of its happening is £, since the sum 

t Scientific Inference, ch. ii. 
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total of our relevant knowledge may be stated in two mutually 
exclusive propositions which exhaust all the possibilities: 

The event happens. 

The event does not happen. 

But we must not, however, confuse the nature of the event 
and the data concerning it with these two verbal propositions. 
By hypothesis, we know nothing about the event; the above 
two propositions provide data for another problem, which may 
be stated simply as : 

‘Given that a certain statement belongs to a class of two 
statements, what is the probability that it is the first of 
these V 

If the principle of insufficient reason is used in this way, it 
tells us something about the arrangement of statements but 
cannot provide us with any estimate of truth-probability of 
their content. 

It might be maintained that in practice the principle is used 
in this way to assess the probability as 4 and to base action on 
the assessment. As an unqualified statement this is definitely 
untrue; when we are unable to estimate a probability, we may 
as a matter of convenience assume a tentative value of 4, but 
only as a matter of convenience in practice. Every illustration 
which can be produced, however, in which the principle appears 
to provide us with an estimate of probability in the sense stated 
above, turns out to be so constructed that by definition all 
relevant information that any one would know or immediately 
seek to discover is automatically excluded. Action is never 
taken on the basis of no information, and judgement, when 
it has to be applied, must be applied to some content of 
fact. 

As an illustration of an abstracted problem consider the 
following: 

AB is a line of unknown extent, XY is a segment of AB , of 
unknown extent and position. If P is a point situated in AB, 
what is the probability, we ask, that P lies within the segment 
.XT ? On the basis of the above principle the answer would be |. 
There is in reality no such answer, for we have insufficient data 
on which to make even an estimate of the probability, since the 
points A, B, X, Y are known only to exist on an infinite line. 
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We can, however, state the three mutually exclusive proposi¬ 
tions which together exhaust all the possibilities: 

P lies to the left of X ; 

Plies in 17; 

P lies to the right of Y; 

and the probability that a statement which is known to be one 
of these three will be the second, is In actual fact, no rational 
being would use such an estimate if, say, he were attempting 
to recover an article lost in a street A B in which X Y was a very 
small section—even if it were the most brightly illuminated 
section. 

If indeed probability is to be used as a guide to action, as 
it must be if it is to play its part in scientific method, then the 
above illustration brings out the weakness of this approach. On 
this basis, the probability of finding the article in XY would 
be -J, whether the lamp is present or not; nevertheless, most 
people would proceed straight to the lamp, since its presence 
is more relevant to action than any abstract estimate of proba¬ 
bility based on mere verbal propositions. It seems clear that 
when a situation arises in which a priori probability can be 
estimated only by means of the principle of insufficient reason, 
this probability itself becomes insignificant as a guide to action, 
and other factors become much more important. 

Other Definitions of Probability 

In the light of the above discussion, it is worth while examin¬ 
ing the definitions that have been given by other writers, as a 
preamble to their mathematical treatment of the subject. 

James Bernoulli begins by defining probability as the measure 
of the strength of our expectation of a future event: this is 
clearly a case of (3), and Bernoulli’s treatment must, if con¬ 
sistent, lead to a mathematical theory of psychology . In spite 
of his initial definition, his analysis is carried through as if 
based on the definition (1) and his treatment becomes that of 
purely mathematical probability. 

According to J. M. Keynes,| probability is not concerned with 
events other than judgements or propositions; thus his treat¬ 
ment, although symbolical in form, is one of a non-measurable 


■f Treatise on Probability (1921). 
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logic and rules out mathematics altogether in the accepted 
sense. 

J. S. Mill,f quoting from Laplace, says: ‘Probability has 
reference partly to our ignorance, partly to our knowledge. . . . 
The theory of chances consists in reducing all events of the same 
kind to a certain number of cases equally possible, that is, such 
that we are equally undecided as to their existence; and deter¬ 
mining the number of these cases which are favourable to the 
event sought. The ratio of that number to the number of all 
the possible cases is the measure of the probability. . . .’ 

This is the definition to which Mill himself inclines, and is a 
confusion of at least two of our three concepts of probability; 
the confusion is complete when later Mill adds that ‘we must 
remember that the probability of an event is not a quality of 
the event itself, but a mere name for the degree of ground 
which we, or some one else, have for expecting it. The proba¬ 
bility of an event to one person is a different thing from the 
probability of the same event to another, or to the same person 
after he has acquired additional evidence. . . .’J 

From what we have already said it should be clear that 
Mill’s definition does not disentangle the various elements which 
enter into probability. For he is obviously thinking of (1) when 
the events are presumed, and of (2) when they are being formed 
in experimental practice. We have seen how important it is 
to distinguish between these two concepts; they are not inter¬ 
changeable although they may be mutually helpful. To take 
the statistical definition, viz. the actual ratio of favourable, to 
the total number, of cases from a block of similar past events, 
as identical with the mathematical definition of probability 
would be to identify a number, which in general varies with the 
growing population, with a unique mathematical value which 
emerges from the definition of certain classes. 

The various types of probability estimates may be illustrated 
by the experiment of tossing a coin. We may say, as has already 
been suggested, that the a priori probability of a head appearing 
is a number drawn and posited from a wide but unspecified 

t A System of Logic , 8th edition, Book III. 

t Cf. Jeffreys, op. cit., p. 10: 4 A proposition . . . has one and only one prob¬ 
ability. If any person assigns a different probability, ho is simply wrong.’ 
See also footnote, p. 22. 
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past experience. We may say that the mathematical proba¬ 
bility is \ on the grounds that there are only two possibilities, 
head and tail, and that these are defined as having equal proba¬ 
bility. Or we may actually perform an experiment ; thus in the 
following table we give the results of tossing such a coin 100 
times, and the number of heads recorded after 10, 20, 30,... 
tosses. It will be seen that the statistical probability ranges 
from 0-46 to 0*65, and is therefore a function of the size of the 
population. 


No. of 
heads 

No. of 
tosses 

Ratio 

6 

10 

0-60 

13 

20 

0-65 

16 

30 

0-533 

21 

40 

0-525 

23 

50 

0-46 

28 

60 

0-466 

35 

70 

0-50 

43 

80 

0-537 

49 

| 90 

0-54 

55 

100 

0-55 


It is thus seen, even at this stage, that yet another problem 
suggests itself as of importance in interpreting such data as 
those given above. If we associate the mathematical definition 
of the probability of obtaining a head (namely |) on any one 
occasion, with the statistical probability as here defined, we 
may inquire what is the mathematical probability that in the 
first 100 tosses of a coin (probability of a head = £) fluctua¬ 
tions from \ of this magnitude will occur. We shall deal with 
this question in Chapter V; but for the moment it is important 
to recognize how mathematical probability may be used to 
interpret a fluctuating statistical probability. 

This fluctuation is, of course, necessarily associated, as in the 
case of a coin, with the method of tossing. It is clear that with 
a given coin which is tossed by some mechanical process 
(beginning always with, say, the head upwards), it could be 
arranged that the result of each toss is always head or always 
tail; or, alternatively, that the ratio of the number of heads to 
the number of tails takes a certain series of values within a 
specified range. 

The above example illustrates the fact, which we shall 



Chap. II, §2 ON THE DEFINITION OF PROBABILITY 29 
encounter frequently, that in any physical process to which 
probability is to apply, there are three interlocked elements: 

(1) a ‘population’ P, in the above case, of heads and tails; 

(2) a process of selection S (here a mode of tossing); 

(3) a sample s drawn from P by the application of S . This 
process may be stated symbolically in the form s = S(P). 

In the previous examples where the coin has been tossed 
100 times, it has been shown that, with the particular form of 
S used in the experiment, the sample s drawn contains numbers 
of heads in ratios lying between 0*46 and 0-65. 

Some discussions of statistical probability, when they attempt 
to link it up with mathematical probability, try to do so by 
asserting that the ratio obtained by sampling a population can 
be made to lie within increasingly narrower limits merely by 
lengthening the process S. It seems clear, from what we have 
said, that it is not simply the length but also the form, of the 
process that is of importance. The gap in the discussion will 
not be bridged until it can be shown that there exists some 
kind of process S which is capable of mathematical and 
empirical definition, and of leading to such a result ; any particu¬ 
lar process of this type could then legitimately be called a 
‘random’ one, and the class of such processes would in such 
circumstances identify the mathematical with the statistical 
definition. That all processes S do not fall within this category 
is obvious from the fact that S can be deliberately designed so 
as to violate the required condition. 

The reader is advised to try this experiment himself, and to 
note that the ratios he obtains are different from those given 
above. 

In his discussion of the subject, Coolidgej* attempts to sur¬ 
mount the breach between the mathematical and the empirical 
approach (i.e. between (1) and (2)) by the following ‘empirical 
assumptions’ of the type to which we have referred. 

‘1. If an event which can happen in two different ways be 
repeated a great number of times under the same essential 
conditions, the ratio of the number of times that it happens in 
one way, to the total number of trials, will approach a definite 
limit as the latter number increases indefinitely. 

f An Introduction to Mathematical Probability (1925). 



30 


THE SCOPE OF PROBABILITY Chap. II, § 2 
2. If an event can happen in a certain number of ways, all 
of which are equally likely, and if a certain number of these be 
called favourable, then the ratio of favourable ways to the 
total number is equal to the probability that the event will 
turn out favourably.’ 

The first of these assumptions is devoid of mathematical 
precision; first, because the question is begged by the phrase 
^the same essential conditions’. This is a phrase which com¬ 
monly occurs in all branches of mathematical physics. It is 
often posed as a fundamental proposition in scientific method 
in the form, ‘The same experiment always produces the same 
results when carried out under the same conditions’. For our 
purpose it is important to note that no two experiments can be 
the same; invariably they differ in time or place, and almost 
invariably in experimenter and apparatus. This criticism 
applies also to the phrase ‘the same conditions’: the test for 
‘sameness’ in two cases is provided by the results , for these are 
numbers which can be checked against each other. In the last 
analysis the test whether these conditions have in fact been 
fulfilled lies in the concurrence of certain intermediate and all 
the final results. Thus the proposition quoted is meaningless; 
it represents an effort to abolish a vital distinction between two 
concepts which differ fundamentally and is simply a concession 
to mathematical convenience. 

So much for the criterion of sameness in the first empirical 
assumption; in the second place, the assertion that the ratio 
approaches a ‘definite limit’ cannot be justified by any mathe¬ 
matical definition of a limit. It has to be dealt with in the 
manner already indicated. 

The second assumption is not an assumption at all, but a 
definition, as is indicated by the phrase ‘equally likely’. This is 
either an appeal to subjective psychology (under (3)) or a petitio 
principii , in that the measure of the probability, as defined, will 
by its consistency indicate a criterion for ‘equal likelihood’. 

An interesting attempt has been made by Mises to erect a 
theory of probability that would bridge the gap between the 
classical mathematical and the statistical approach. The 
former, as we have seen, is concerned with a given population 
and confines its questions to those relating to the relative fre- 
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quency of various arrangements of the elements of that popula¬ 
tion. Mises’s theory is in the first place a statistical one. He 
confines his attention to the infinite succession of unit samples 
as they are drawn from an unknown population or as they 
are created by the repetition of a particular action (e.g. the 
tossing of a coin), and the questions that are raised concern 
the nature of the predictions that can be made regarding the 
occurrence or non-occurrence of a particular kind of sample in 
the sequence. 

Let the symbols 1 and 0 be used to represent ‘success’ and 
‘failure’, or ‘black’ and ‘white’, the two possible outcomes of 
an action. Then Mises is concerned with a collection of the type 
1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 ,... 
and proposes to define its structure in such a way as to provide 
a reasonable meaning to the phase ‘The probability of the 
occurrence of 1 ’. 

The nature of the definition of structure, however, must not 
be such as to destroy the ‘random’ occurrence of the l’s with 
respect to the 0’s; in other words, there must be present a per¬ 
sistent disorder. This implies, of course, that by no detailed 
study of the system should it be possible, for example, for a 
gambler to discover a pattern or law in the occurrence of the 
symbols of such a form that he could arrange his gambling with 
any certainty on the occurrence, say, of a 0 or a 1 at a series 
of allotted positions. 

To fulfil these requirements the sequence is restricted by the 
following two conditions: 

(1) If in the first n symbols, there occur m of the type 1, 
then the sequence is such that 

.. m 
lim — = p. 

n—>oo 71 

The probability of the occurrence of 1 in the sequence is defined 
as p. 

This statement may be put in a form more usual with the 
treatment of sequences; thus corresponding to any small num¬ 
ber c it is possible to find a number of terms N, beginning from 
the left, and a number p, such that for all values of n ^ N the 
ratio m/n will continue to differ from p by less than c. 



32 THE SCOPE OF PROBABILITY Chap. II, § 2 

(2) The second condition is the Principle of Disorder. It 
demands further that the sequence shall be of such a nature 
that by whatever system or law, related to the order of the 
terms, a new sequence be formed from all or some of the terms 
of the original, all such derived sequences shall separately 
satisfy the previous condition with the same value of the prob¬ 
ability p. 

At this stage two points should be noted. Condition (1) at 
first sight bears a very close similarity to assumption (1) of 
Coolidge (p. 29). 

Here we may remark, however, that whereas the latter took 
the statement of convergence as an empirical assumption ap¬ 
plicable to real statistical data, in the present case the condition 
of convergence is merely a restrictive property of the collection 
to be considered. The question whether sequences satisfying 
such a condition do embrace actual statistical data empirically 
derived remains open. 

The second criticism may in a sense be much more serious. 
The Principle of Disorder, applicable as it must be to every 
systematically derived sequence, must impose very drastic re¬ 
strictions on the original. It has been claimed, in fact, that if 
conditions (1) and (2) are not actually inconsistent (in which 
case the class of sequence defined would be empty), there can¬ 
not be any wide range of types that satisfy both requirements 
and that therefore the application to actual statistical data is 
seriously restricted. 

A sequence satisfying the foregoing two conditions is termed 
by Mises a Collective, and the purpose of his investigation is 
to show, if possible, that the fundamental theorems of mathe¬ 
matical probability, viz. the Addition, Multiplication, and Ber¬ 
noulli Theorems,t all hold for a Collective. It would then be 
possible to state under what conditions these theorems might 
be validly applied to the analysis of a statistical system. 

In the pursuit of this objective great mathematical difficulties 
have been experienced. To establish the multiplication theorem 
a special definition has to be made to cover the case of two 
Collectives that are mutually disorderly. In effect this is met 
by the requirement that by no systematic transformation can 

t See pp. 49, 51, 58. 
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the one Collective be transformed into the other. On the other 
hand, to establish Bernoulli’s Theorem Mises requires to apply 
the Principle of Disorder already referred to, not only to 
sequences transformed according to some law of position, i.e. 
according to some specific function of n, but also to all those 
that can be derived by applying any regular rule to localized 
qualities in the Collective, e.g. deriving a sequence by choosing 
the numbers that are two places to the left of each 1. 

We need not pursue this topic in greater detail. Serious 
criticisms have been raised against the validity of the Mises 
approach by Waismann, Kamke, Reichenbach, Popper, and 
others. It is contended, for example, that condition (1) is in 
itself meaningless; that there can be no significance to the con¬ 
vergence property without defining the law of the sequence, and 
since the essence of the sequence is that it should be lawless 
except for condition (1), there is an inherent contradiction in¬ 
volved. Actually, of course, Mises’s first condition is really a 
demand on the derived convergency sequence. Similar criti¬ 
cisms have been levelled against the suggestions of Kamke and 
Reichenbach, in their efforts to escape from the various logical 
dilemmas aroused. The result is that the scope of the Collective 
becomes so restricted that the class of illustration included 
reduces almost to emptiness, it becomes increasingly difficult 
to find actual illustrations that satisfy the requirements, and 
the statistical value of the approach is thus seriously impaired. 
The importance of the subject rests therefore rather on the 
nature of the logical problems raised than on any adequate 
bridge that may be built between statistical and mathematical 
probability. 

3. Mathematical determinism 

Scientific investigation, when used as a guide to action, is 
turned in the first instance towards making a prediction; it 
seeks to state that if certain circumstances remain unchanged, 
then an event will develop in a particular way. In mathematics 
this process takes the form of strict logical deduction ; in statisti¬ 
cal work, on the other hand, the process is essentially one of 
induction, and for that reason the final statement is accom¬ 
panied with less assurance than the mathematician’s. The 

4360 _ 
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difference in outlook between the mathematician and the 
empiricist is, however, more apparent than real; the former has 
cast aside his doubts by postulating a given set of circumstances 
and relying on mathematical logic: the latter, in making his 
induction, is doubtful whether the ‘givenness’ can be carried 
forward. Nevertheless, both scientists arrive at a unique and 
precise conclusion. It is worth while examining how this can 
occur, since our examination will bring out the part which 
probability estimates play in the process. 

The Typical Problem of Mathematics 

Consider the problem of constructing a plane triangle from 
the knowledge of two sides and the angle included between 
them. If this knowledge is exact, the triangle can be uniquely 
constructed, and all its characteristics, e.g. the length of the 
remaining side and the angles adjacent to it, are uniquely 
calculable. 

This example can be taken as typical of a mathematical 
problem: certain data are given and certain unique conclusions 
follow logically. In addition to the ‘given’ facts, however, there 
are always certain tacit assumptions implicit in the discussion 
—in the above case, the assumptions of Euclidean geometry. 

Suppose now that the initial data, for the construction of a 
plane triangle, are two sides and an angle, which is not the 
included angle. Then it is well known that in general there is 
no longer a unique solution to the problem; there are in fact two 
triangles which satisfy the requirements stated. If we asked 
whether our conditions ‘determined’ the triangle, the answer 
would certainly be No. Suppose, however, that having dis¬ 
covered the existence of the two solutions, we restate the 
problem in the form: To construct the two triangles which 
have two given sides and a given angle opposite to one of them. 
The solution to our problem is now unique and has been con¬ 
verted from an indeterminate problem into a determinate one 
by couching the statement of the problem in appropriate 
form. 

We take another set of data: suppose that it is required to 
construct the triangle ABC whose base AB is given and whose 
angle C is a right angle. There is, of course, no such unique 
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triangle; any one of the triangles whose vertex C lies on the 
circle whose diameter \%AB satisfies the given conditions. Thus, 
at first sight, the problem is necessarily indeterminate, with an 
infinity of solutions. If, however, we restate the problem by 
requiring the locus of the vertices of all triangles satisfying the 
specified conditions, then the locus is unique, and the problem 
has a unique, determinate solution.! In this, as in the pre¬ 
ceding case, there are assumptions inherent in the analysis: 
we have, for instance, implied that all the required triangles lie 
in a plane; if we become aware of this restriction and remove it, 
we obtain for the locus of vertices not a circle but a sphere. 

These examples illustrate the general proposition that every 
problem in geometry which starts from a set of data linked 
together and worked on by a logical process, leads to a unique 
result which can be regarded as the consequence of a logical 
determinism. 

Problems in classical mechanics are identical in form with 
such geometrical problems. Once more we are given certain 
entities—particles of matter, masses, electric charges, etc.— 
which correspond to the points and lines of the geometrical 
problem. In addition there are postulated fields of force or 
interactions between the particles of matter. A typical problem 
in mechanics may be posed thus: ‘A mass M (which we call 
the sun) is situated in the neighbourhood of another mass m 
(called the earth); given that the masses are moving with a 
known speed and attract each other with a force equal to the 
inverse square of the distance, what follows as regards their 
paths?’ Here again the problem is in reality one of finding a 
form of statement which, with the given data assembled in 
mathematical symbols, leads to an inescapable conclusion. 

Consider another example: A particle is projected in any given 
direction with a given velocity. Given also that the earth’s 
attraction imposes on it an acceleration g downwards, where 
will the particle meet the horizontal plane through the point 
of projection ? In these circumstances the solution is logically 
unique and determinate and is applied for the prediction of 
physical events. This fact is sometimes referred to as mechanical 

t Cf. Abel's dictum: ‘On doit donner au problem© une forme telle qu’il soit 
toujours possible de le r&oudre.' 
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instead of logical determinism; but to justify such terminology 
we should require evidence that what is given and what is 
accepted as logical necessities are both necessities of natural 
mechanical processes. For the moment the important fact for 
us is that the conclusion is unique, and in the circumstances 
inescapable. If instead the particle is projected in any direction 
with a given velocity, it is not difficult to prove that there is no 
unique solution to our problem. On the other hand, if we 
require the maximum range described by the particle, then once 
again the solution is unique and determinate. 

In classical mechanics, as we have described it, every problem 
can be posed in such a way that, with the given data and 
the principles for their combination, its solution is unique 
and precise, and no indeterminism need arise: the essence of 
the procedure is deterministic. Now there are two classes of 
investigation in which this procedure appears to be unsatis¬ 
factory, and both arise from the application of the classical 
method to problems of prediction. As we have seen, the process 
of prediction is itself necessarily a logical one: if we have 
appropriately phrased our problem in the light of the data and 
adopted the correct, physical guiding principles, to obtain any¬ 
thing but a unique solution is, in physical science, tantamount 
to a failure of science. We therefore ask in what respects may 
our assumptions and principles be invalid; in so far as they relate 
to the question of prediction. 

The Two Classes of Investigation 

Let us examine the two classes of investigation referred to 
above: both result from the problem of deciding what may be 
considered as ‘given’ in the process of prediction in Nature. 
For his own purposes, the mathematician may assume any set 
of mutually consistent hypotheses; but in order to satisfy the 
physicist, these must represent what is actually found in 
Nature. Thus, in our example of the projected particle, we 
assumed that the particle is projected with a given velocity in 
a given direction. The particle may be given, but in practice 
it is not a mathematical ‘point’ but a physical ‘piece of matter’ 
having size, shape, and weight. Again, the given velocity of 
projection is, for physical purposes, the velocity as actually 



Chap. II, §3 MATHEMATICAL DETERMINISM 37 

measured ; and an elementary knowledge of experimental pro¬ 
cesses tells us that it is impossible to say precisely what that is. 
As far as the experimenter’s knowledge goes, it may be any¬ 
thing between certain narrow limits, and a variation of even 
a small amount in the velocity may make a considerable 
difference in the range of the particle. The mathematician, 
too, may make a tacit assumption that the particle is pro¬ 
jected in vacuo : the physicist, who knows better, expects the 
neglected resistance of the air to make a considerable difference 
to the range. 

There are innumerable other factors, which we need not 
describe, that cause the actual problem to differ from the 
mathematical one. Even the final verification to test the 
mathematical prediction of the range is subject to the same sort 
of imprecision as the measured ‘length’ of the desk (p. 16). What 
does this imply? It means that in assuming a series of initial 
factors as ‘given’, the mathematician has followed a mathe¬ 
matically determinate scheme, and has thus tacitly supposed 
that all the interconnexions of his abstract isolated problem 
with the rest of the universe can be legitimately ignored. If he 
proposes to apply such a process to the real world, every one of 
the so-called ‘given’ elements in his problem must be intro¬ 
duced not in the form of a discrete quantity, but as one which 
may vary within a certain band of values, determined for him 
by the experimenter. The process of prediction can still be 
carried through and the answer obtained is unique; but it has 
to be couched, not in the form, ‘the resulting range is precisely 
so much’, but in the form, ‘the range must lie within a certain 
band of variation’. We must realize that, in making a predic¬ 
tion, the mathematician endeavours to anticipate the measure¬ 
ment that will actually be found, and that he is concerned only 
with such measurements: he never discusses the question, qua 
mathematician, whether the process from which these measures 
emerge is itself determinate apart from this. A prediction, let 
us repeat, is an attempt to anticipate measurement; and to that 
extent only is it an attempt to anticipate process. 

It will be recognized that the above description of the mathe¬ 
matically determinist process in physics always involves an 
indeterminacy in a certain special sense: it arises from the gap 
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between the actual process interrelated in Nature and the 
partial measures of isolated phenomena obtained by the experi¬ 
menters. It is bound up with the fact that science never studies 
Nature as a whole but in fragments, tacitly assuming that ideal 
apparatus can be designed that will be unaffected by the process 
studied, and that processes can be discovered that are unaffected 
by the apparatus used. To ignore these inescapable intercon¬ 
nexions, implying that with greater refinement in apparatus 
and experimental technique the mathematical hypotheses could 
be made to approximate to any degree of closeness to the 
physical process, is to be guilty of a methodological fallacy. 

Thus the first type of indeterminacy has usually been ascribed 
to experimental error, the cause of the error being assigned to 
the so-called ‘laws of chance*. Whatever those laws might be, 
the real implication was that the universe was ‘governed* by 
mechanical laws plus laws of chance; and that if only the latter 
could be fully elucidated, the mathematician’s predictions 
could be made to coincide absolutely with the experimenter’s 
measurements. It is worth examining in detail why such a co¬ 
incidence could never occur. Consider this typical illustration. 
A measuring apparatus has on it a measuring scale subdivided 
by fine lines: the measuring process consists in fitting a mark 
between two such subdivisions. Thus in every measurement 
there is implicit an actual experimental uncertainty, and in an 
involved experiment, into which many such measurements may 
enter, the total extent of such uncertainty may be large. 

The second class of indeterminacy does not differ funda¬ 
mentally from the first; the range of experimental uncertainty 
may be much less important in magnitude but of much deeper 
physical interest. In our example of the projected particle we 
have seen that neither the initial position nor the initial velocity 
can be exactly specified. When, however, the particle is one of 
sub-atomic nature (e.g. an electron) the statement of the 
initial conditions presents a special kind of difficulty. To find 
its position and speed, it would have to be examined, say, 
through a powerful microscope, and if it is to be visible it must 
emit at least a quantum of light-energy. But this emission will 
be accompanied by a rebound on the part of the electron, so 
that the act of seeing it and measuring its position and speed 
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can only occur physically when its speed and position are in 
process of change and not otherwise. Here the process of 
measurement, itself part of Nature, is intimately bound up with 
and involved in the actual process studied. At the moment it 
does not appear to be possible to isolate one from the other by 
any extension of normal scientific method. From a study of 
the theory and practice of such processes it is found that the 
product of the uncertainties in the experimenter’s measure¬ 
ment of position, and of velocity, is of the magnitude of Planck’s 
constant, a certain well-known number. Thus, the physical 
limitation involved in the attempt to specify the ‘given’ condi¬ 
tions for sub-atomic particles leads us to the conclusion that 
both the initial position and the initial speed cannot be inde¬ 
pendently determined to any prescribed degree of accuracy, 
even if the numerous factors already involved in the first class 
of problem were not present. 

Let us emphasize once more the distinction between the two 
classes of problem. In the first class, despite the uncertainties 
which arise from the entanglement of the abstracted problem 
with the rest of the universe, the mathematical logic of the 
abstract process can still be carried through; in the second class 
the mathematician who has exposed one of the forms of entangle¬ 
ment is faced with the fact that if he attempts to allow for it 
initially, the mathematical logic he intended to use no longer 
avails him. Two quantities which, for the purposes of his 
logic, should be initially independent, are shown to be inter¬ 
locked. Accordingly, he is now faced with a new class of 
problem: given that the initial speed and position are inter¬ 
related in the manner described, what are the guiding processes 
to be assumed for such a group of entities, in order that a unique 
answer may be obtained, and what will be the general nature of 
that answer? It must be realized that we are still dealing with 
a question of mathematical determinism; and although we may 
find as a result of such an investigation that our prediction 
asserts that after a certain interval of time the electrified 
‘particle’ may be anywhere within a certain region, this does 
not vitiate the fact that the process is still deterministic; the 
problem has only to be correctly stated. The mathematical 
process determines uniquely for us what can be derived from 
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the given assumed circumstances. The so-called ‘uncertainty’ 
resides simply in the physical specification of the assumptions. 

A stream of electrons presumably in parallel motion, when 
striking a film, distribute themselves in the form of concentric 
rings. The fact that this phenomenon may be described mathe¬ 
matically as ‘the probability of an electron falling at any 
distance from the centre of the film is some kind of function of 
position’, implies not that there is a physical indeterminateness 
in the fate of any individual electron, but that, in the circum¬ 
stances, the probability distribution describes the behaviour 
for the stream of electrons. If, therefore, we desire to restate 
the deterministic conclusions concerning the group-distribution 
in terms of the behaviour of an individual electron, we can only 
do this by describing its behaviour in terms of probability. This 
does not imply an uncertainty in its intrinsic behaviour, but a 
lack of detailed knowledge for solving the new problem. 



CHAPTER III 


THE THEORY OF ARRANGEMENTS 


Since the mathematical theory of probability treats of the 
relative frequency with which certain groups of objects may be 
conceived as arranged within a population, one type of problem 
which we have to consider, preparatory to the main investiga¬ 
tion, is concerned with the number of ways in which various 
sub-groups may be formed or partitioned from the members of 
a larger group. Many of the theorems arising from this problem 
are of an elementary nature and to these the present chapter is 
devoted. 

In dealing with objects in groups we are led to consider two 
kinds of arrangement, according as the order of the objects in 
the groups is or is not taken into account. 

Definition. The number of different ways in which n objects 
can be arranged in groups of r, regard being had to the order of 
arrangement , is called the number of r-permutations of the n 
objects. 

Evidently, two permutations are identical when they contain 
the same objects arranged in the same order. 

If the n given objects are all different, the number of r-per- 
mutations is denoted by the symbol 71 P r 


To find the number of r-permutations of n different objects 
To form any one arrangement we may select any one of the 
objects to be the first in the arrangement; such a selection can 
be made in n ways. The second object in our arrangement may 
be any one of the remaining n— 1; thus there are n{n— I) ways 
of arranging the first two objects. Similarly, the selection of the 
first three objects can be made in w(n— l)(n—2) ways. Thus, in 
general, we can select r objects in n{n— l)(n— 2)...(n—r+1) 
ways; and therefore 


n P r — n(n— l)(7i—2)...(tt— r+ 1). 


Corollary. 
objects is 


The number of n-permutations of n different 
n P n — n(n—l)(n—2)...3.2.1. 


The product n(n—l)(n—2). .. 3 . 2 . 1 is denoted by the symbol 
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nl, called ‘factorial n\ To obtain consistency in our notation, 
we make the convention that the symbols 1! and 0! are to be 
interpreted as being equal to unity. 

Ex. 1 . How many different numbers can be formed by using four out 
of the nine digits 1, 2, 3,..., 9? 

The required number is ®P 4 = 9. 8 .7 .6 = 3,024. 

Ex. 2 . How many different numbers, each of four digits, can be 
formed from the ten digits 0, 1, 2,..., 9? 

The total number of 4-permutations of the digits is 10 P 4 , and from 
this we must deduct the number of permutations in which 0 occupies 
the first place, that is, ®P 3 . Hence the required number is 

iop 4 _»jP = 4 , 536 , 

Ex. 3. Show that the number of ways in which n books can be 
arranged on a shelf so that two particular books are not together is 
(n—2)(n— 1 )!. 

To find the number of 'permutations of n objects which are not all 

different 

Let the n objects be represented by letters, and suppose that 
p of them are a’s, q of them b' s, r of them c’s, and so on. 

If for a moment we suppose that the p letters a are changed 
into letters which are different from each other and from the 
rest, then by changing only the arrangement of these new 
letters, we should have, instead of one permutation, p\ different 
permutations. 

Hence, if P is the required number of permutations, the 
number of permutations now obtained is Pp\. 

Similarly, if we suppose that the fc’s are changed into q letters 
different from each other and from the rest, the number of 
permutations is now Pp\q\ . Proceeding in this manner, we 
see that if all the letters are changed so that no two are alike, 
the total number of permutations is Pp\q\r \.... 

But in this case it is clear that the total number of permuta¬ 
tions is w!. Hence Pp\q\r \... = n\> so that 



This result is, apparently, due to Montmort (1708). 

Ex. 1 . The number of permutations of all the letters of the word 
11 ! 

misdissippi is ■ - - = 34,650. 

4 ! 4 !«! 
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Ex. 2. Find the number of r-permutations of n objects, when each 
can be repeated any number of times. 

Any one of the n objects can be selected first, and any one of the 
objects is still available for selection; and so on. Hence the required 
number is 


Ex. 3. Show that the number of permutations of n objects all together, 
in which r specified objects are to be in an assigned order, is n!/r!. 

Ex. 4. Prove that n + l P r = n P r +r n P r _ Y . 


Definition. The number of different ways in which n objects 
can be separated into groups ofr, irrespective of the order of arrange¬ 
ment , is called the number of r-combinations of the n objects . 

When the objects are all different, the number of r-combina- 
tions is denoted by n C r . 


To find the number of recombinations of n different objects . 

It is clear that every such combination would give rise to r\ 
permutations, if the order of the objects were altered in all 
possible ways. Hence we have 

r^C, = n P r . 

The same result may be obtained otherwise, as follows: Consider those 
r-combinatlons which contain a particular object; evidently the number 
of such combinations is n-1 C r _ 1 . Thus, in the total number of r-com- 
binations every object occurs n ~ l C r _ Y times, and therefore the total 
number of objects included is n n-1 C' r _ 1 . But since r objects occur in 
each combination, the total number must also be r n C T . We thus derive 
the relation r«C r = n»~'C r _ v 

This holds for all the values of n and r. Changing n into n — 1 and r into 
r — 1, we have in succession 

(r—l) w_1 C r _ 1 = (n—l) n ~ 2 C r _ 2 , 

(r-2)"-*C r _ a = (n-2)"-3<7 r _ s , 


n-r+l(^ _ 

Multiplying together corresponding members of these equations and 
cancelling the common factors, we obtain 

*C r = n(n-l)(n-2)...(n~r+l)/r!. 

Note that n C r may be written as n!/r!(n—r)!. 

Corollary 1 . The number of r-combinations of n different 
objects is equal to the number of {n—r)-combinations of the n 
objects . 

For n C n ^ = n\/(n—r)\(n—n+r)\ = n!/r!(w—r)l = n C r 
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Corollary 2. n C r + n (7 r _ 1 = n+1 C r . 

We have 

nn , nn _ n(»—l)...(n—f+1) , »(*— 1 )...(»—r+2) 

° r+ C '-i-r! + (7=T)1 


r! 


(n +l)n( n -l)...(n-r+2 ) = n+lc 
r! r 


We leave as an exercise to the reader the proof of these results 
from first principles. 


Ex. 1. Find the number of diagonals of a polygon of n sides. 
The number is 


n C 2 —n — \n(n— 1) — n — Jw(n —3). 

Ex. 2. In how many ways can a committee of 6 be formed from a 
party of 5 ladies and 8 gentlemen, if the committee is to contain 2 ladies ? 

The number of ways of choosing the ladies is 5 C a ; the number of ways 
of choosing the gentlemen is a C v ' Thus the number of possible ways is 


_ 6 * 4 8 . 7 . 6 . 5 _ 

° aX ~ 172 • 1.2.3.4 ~ 70 °- 


Ex. 3. If the committee is to contain at most 2 ladies, then the 
number of possible selections is 

6 C t X 8 <7 4 -f X s 0 5 + *C t = 700+280 + 28 = 1,008. 

Ex. 4. Show that, in the n -combinations of 2n different objects, the 
number of combinations in which a particular object occurs is equal to 
the number in which it does not occur. 

Ex. 5. Given n points in a plane such that no two of the lines joining 
pairs of points are parallel and no three are concurrent save those which 
pass through one of the given points, in how many points do the lines 
intersect ? 


For the further discussion of problems of arrangement a number 
of preliminary theorems are required. 


Use of Stirling's Theorem 

From the above examples it will be noted that the calculation 
of n P T and n C r , when n and r are large, may be a tedious if not 
a difficult process. For the purpose of approximate evaluation, 
it is often convenient to replace the factorial expressions which 
occur by other expressions to which they tend asymptotically. 
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A formula due to Stirling (1718), which we shall establish later 
(p. 65), tells us that 

n! = V( 2 ^)» B «-"( 1 + 1 ^-). 

where e = 1 + i + = 2-71828.... 

II Z I 

Thus the relative error involved in taking only the first term 
in the above formula is about , i.e. 8/n per cent., approxi¬ 
mately. 

Ex. We have 5! = 120, while the Stirling formula gives 5! = 118*1. 
Again, 10! = 3,028,800; Stirling’s formula gives 3,598,699. 

The Binomial Theorem 

Suppose that we are given n letters a v a n , and that we wish to 
evaluate the product 

(1 -f °i)(l + o 2 )*-.(l J T a n)' 

The first terra in the expanded form of this product, in which none of 
the letters occurs, is 1; the next term, in which each letter occurs once, 
is the sum of all the letters, denoted by 2 a i> the next term consists 
of the sum of the products of all the letters taken two at a time, denoted 
by 2 a i° 2 i an( t 80 on - The final term is simply the product of the n 
letters altogether. Thus we have 

(l+° 1 )(l+® l )—(l+®n) 

= 1 -j- 2 2 c h a a _ i - 2 a i ®3~f"••• ~h a ici 2 ...a n » 

Now suppose that we write a x = a 2 = ... = a n — x; the product be¬ 
comes (l-fa) n . The term 2 a i is evidently x n C lt the term 2 a i°2 is 
x x n C 2 > and so on. Hence, substituting these results, we obtain 
(l+x) n = l + n C 1 x+ n C 2 x x + n C 2 x*+...+x n . 

This expansion is known as the binomial theorem for a positive exponent n. 
The Binomial Coefficients 
We write the binomial expansion in the form 

(14-s) n = c 0 +c l x+c 2 x*+...+c r x r +...+ c n x n . (1) 

The coefficient c r is equal to n C r — n (7 n _ f , by Corollary 1 (p. 43). Thus 
c r = Cn_ r ; that is, the coefficient of x r is equal to the coefficient of x n ~ r . 
Putting x == 1 in the identity (1) we obtain 

c 0 4*c 1 -}-c a +...-hc n — 2 n . 

Putting x — — 1 , we have 

Co-Ci+^-Cj-f ... + (~l) n c n = 0. 

From these results it follows that 

«b+^+« 4 +••• = C l +c 9 +c $ +... = 2 n ~ 1 . 
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Now put x = i, where i 2 = — 1 , i 3 = —i, i l = 1, and so on. Thus 
c 0 -ftci-c a —ic 3 -fc 4 -f... = (1-M) n . 

Putting x — — i, we have 

c o—ici— c a +tc 3 +c 4 —... = (1— i) n . 

By addition we obtain 

c 0 -c a +c 4 -c 6 + ... = «(l+t)»+(l-*)»}. 

By subtraction, 

c»—C,+Cj—... = I{( 1+ i)»_(l_i)»}. 

Ex. 1. By considering the product of (1 4 a;) n and (14- l/a?) n , show that 
c?+c?+c24...-f <2 = 2n!/(n!)*. 

Ex. 2. Find the value of 

c 2 Cj 4 c a —••• • 

Ex. 3. Prove that 


c 1 42c a +3c a 4...+nc n = n2 n ~ 1 . 

Greatest Term in the Expansion 

In the expansion of (14a;) n , where n is a positive integer, and x is 
positive, the ratio of the (r-f l)th term to the rth is evidently 
n(n-l)...(n~r-fl) (r—1)! n-r+1 

r! 'n(n-l)...(n-r+2) X ~ r X ' 


This ratio can be written as - — 1 and since W ~^ - decreases as 

r increases, the ratio itself decreases as r increases. If the ratio is less 
than 1 for any value of r, the (r+ l)th term will be less than the rth. 
Hence, in order that the rth term should be the greatest we must have 


n—r+\ 

- x < 1 and 

r 

Thus r satisfies the inequalities 


n-r+2 
- ~x > 

r—1 


1 . 


r > 


(n+l)x 

“i4T* 


r < 


(n+l)x 
x +1 ^ 


When r = - n , we have ——a? = 1; in this case there is no one 
x+l r 

greatest term in the expansion, but the rth and (r +1 )th terms are equal, 
and are greater than any of the other terms. 

If x is negative, the terms of the expansion alternate in sign, but the 
method used above still avails to determine the numerically greatest 
term in the expansion. 

Ex. Find the greatest term in the expansion of (l+z) 10 , when x = f. 


The Multinomial Theorem 

If nis a positive integer, the expression (^i+^ 2 +...+^ m ) n may 
be expanded in a form analogous to that obtained for (!+#)*. 
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Thus, to distribute the product of n factors 

(x l +x 2 +...+xJ{x 1 +x 2 +...+xJ...(x l +x 2 +...+x m ) 
we have to find the coefficient of any given term, for example 

*i l *2* - 

where a 1 +a 2 +---+a m ” n - 

Evidently, the number of times that this particular term 

arises in the product is the number of n-permutations of the 

m letters, in which oc x are alike, a 2 are alike . . ., and so on. 

Hence by the theorem given above (p. 42) the coefficient of the 

, . n\ 

given term is —-■—-. 

<x 1 !a 2 !...a m ! 

Thus, finally, we obtain 

(x 1 +x 2 +...+xJ» = Y — -* ! x?-xg-...x%», 

where a l9 a 2 ,...,a m take all positive integral values for which 
a i +« 2 +...+a m = n. 

This result is the multinomial theorem for a positive integral 
index. 


The Binomial Series 

If n is not a positive integer, the series 


\-\-TlX-\- 


n ( n -“ 1 ) ^ , n(n-l)(n-2) 

2! 3! " t_ 


does not terminate; we may show that it converges for all values of x 
which are numerically less than unity. When n is a negative integer 
the sum of the series, for such values of x , is equal to (1 -f x) n ; and when 
n is a rational number the sum is equal to the principal value of 
i.e. the real positive value of this expression. 

Thus, if n = —ra, where m is a positive integer, then 


(l-\-x)~ m — l—mx + 


m(zn+ 1) m(m-f l)(m+2) 

2 ! *- 3 !-* 


provided x < 1. In particular, we have 

(l+^)“ 1 = I— x+x 2 —..., 

(I — #)” 1 = I+a: + a; 2 -f...» 

(l-f-x)~ a = l-2a:+3:c 3 -.... 

To find n H r . The number n H r of homogeneous products of r 
letters which can be formed from n given letters may be found 
by a method which will be employed extensively later. Suppose 
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that the letters are a, b,c,.... If we form the product 
(\+ax+aW+...+a r x T )(l+bx+b 2 x i +...+b r x , )x 

X (l+ca;+c a £ 2 +...+c f af)... 

it is at once seen that the sum of the required homogeneous 
products is the coefficient of x r in this product. Hence the 
number of such products is the coefficient of x r in the product 

(l+«+a: 2 +...+x’')(l+*+a: 8 +-.+* r ).- 

consisting of n identical factors. 

If we suppose that x < 1, this is the coefficient of af in the 
expansion of (1— x)~ n . Thus, by the previous result, we have 
n H = n(»+l)(n+2)...(n+r-l) = n+r _ 1(? 
r r\ r% 



CHAPTER IV 


ELEMENTARY THEOREMS ON MATHEMATICAL 
PROBABILITY 


We begin with a restatement of our definition in simplified 
form: 


If there is a class of N letters containing n letters a , then the 
probability of a letter , specified as belonging to the class N, being 
a letter a is n/N. 

By ‘probability’ in this chapter it is understood we mean 
mathematical probability. 

Suppose, for example, that we have a group of symbols which 
are separable into the numbers 1, 2,..., 9, the letters a, 6, c, and 
the letters a, /?, y. A particular symbol is defined as being a 
member of the whole class. We may*then state, on the defini¬ 
tion, that the probability that the symbol is a number is 


9 

9+3+3 

j3_1 # 

15 ~ 5’ 



the probability that it is a Roman letter is 


and the probability that it is a Greek letter is 


3 _ 

15 


1 


5* 


It should be noticed that the probability that the symbol is 

a letter and not a number is ~ = — +— — i + I — Thus 

15 15 15 5 5 5 


the probability that the symbol, defined as a member of the 
whole class, should be a member of the class consisting of the 
two subclasses of letters, is the sum of the probabilities that it 
is a member of each of the two subclasses. This result is an 
illustration of the following general theorem. 

Theorem. An object is defined as belonging to a class of N 
objects which contains the subclasses of objects a v a 2 , in number 
n lt n 2 , respectively , having no members in common . Then if'the 
probabilities that the object belongs to the subclasses a v a 2 be 
separately p 1 and p 2 , the probability that it belongs to the combined 
group of objects a 1 -\-a 2 is Pi+p 2 - 

The proof of this theorem follows at once from the definition. 
Evidently the result may be extended step by step to give the 
probability that an object of the class N should belong to the 
group a 1 +a 2 +a 3 , or the group a 1 +a 2 +a 3 +a 4 ; and so on. 
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Ex. Suppose that we are given a book of N pages such that n 1 of the 
pages each contain one printer’s error, n a contain two such errors,..., 
and generally, n r contain r errors. Then 

the probability that a page has r errors is n r /N; 

the probability that a page has at least r errors is 

the probability that a page has not less than r and not more than 

8 errors is (n r -\-n r + x -\- ...+n s )/N. 

For it is clear that if a page has, say, r errors, it cannot have s errors, 
where r and 8 are unequal; so that the classes of pages so defined have 
no members in common, and our theorem can be applied. 

It is obvious from our definition that mathematical proba¬ 
bility is a number lying between 0 and 1 and that, since it is the 
ratio of two integers, it must be a proper fraction. We shall have 
occasion later to extend the definition. 

When the probability p is equal to unity, its maximum value* 
is attained; in such a case the class to which the object belongs 
is identical in extent with the subclass. It is desirable to avoid 
referring to the case p = 1 as ‘certainty’ for this would seem to 
imply a psychological state to which our numbers have not 
necessarily any direct relevance. (One may be certain of the 
truth of a falsehood.) Similarly, the case p = 0 is frequently 
referred to as representing ‘falsehood’, and to this the same 
criticism applies; in point of fact, p = 0 is excluded from our 
consideration, for such a value of p would imply that the sub¬ 
class is not a member of the whole class. 

Mathematical Expectation 

Let the letters a v a 2 ,... denote particular classes of events, 
with which are associated numbers M v M 2 , .... For example, the 
events might be the actual processes of measuring some object, 
and M v Jf 2 ,... the magnitudes obtained. Then the probability 
of occurrence of the event is also the probability of occurrence 
of the magnitude. 

If p x is the probability that the event a x will produce a magni¬ 
tude M Xi then its mathematical expectation is defined as p x M v 
Thus, a person tosses a coin; if it turns up heads he is to receive 
a shilling—otherwise he receives nothing. Then the probability 
of winning a shilling is \ and the expectation is sixpence. 

More generally, in the case of n independent events, for which 
the probabilities that the events will produce magnitudes M v 
M n are respectively p l9 jp 2 ,..., p n , the expectation E 



Chap. IV ELEMENTARY THEOREMS 61 

associated with some unspecified event of the set is 

E = J,p r M r . 

r = l 

Ex. If n measurements, all equally probable, are made of the same 
length, show that their mathematical expectation is the average value. 

Theorem. If p is the probability that a member of a class is 
also a member of a given subclass, then \—p is the probability that 
it is not a member of that subclass. 

For if the class N can be divided into subclasses having n and 
N—n members respectively, and 2 ? = n/N, then the probability 
that an object is not a member of the class (n) is the proba¬ 
bility that it is a member of the class (N—n). This probability 
is (N—n)/N = 1 —p, which proves the proposition. For con¬ 
venience we write q — 1 —p. If this relation is written in the 
form p+q = 1, it is equivalent to the assertion that ‘it is true 
that an object is either a member of a particular subclass or 
of the class of remaining objects’. 

Ex. 1. The probability that a coin falls either on its head or its tail, 
given that it falls flat, is 1. If the probability that it falls on its head 
is J, then the probability that it falls on its tail is also 4 . Thus, the proba¬ 
bility that it falls on its head = 1 — (the probability that it does not). 

Ex. 2. In the example (p. 50), the probability that a page has 
not more than r errors is 1 — (n r -f n r+1 4-...)/A^. The probability that it 
has no errors is 1 — (n x ~\- n 2 -f ...)/N. 

Ex. 3. Consider two dice each marked with the numbers 1 to 6. It 
is given that each lies with a face upwards: what is the probability that 
both faces show fours ? 

To find the total number of members of the class of pairs of faces, 
one for each die, we observe that each of the faces of one die may be 
grouped with each face of the other, thus giving 6 x 6 = 36 members 
of the class. There is only one member of the class (4, 4); thus the 
probability that both faces show fours is We notice that 3 V — i X 4» 
i.e. equals the probability that a face of one die is a four, multiplied 
by the probability that a face of the other die is a four. 

Ex. 4. In a certain examination, 10 of the 30 students receive over, 
and 20 under, 60 per cent, of the total marks. It is known that two- 
thirds of the candidates have written their papers in ink and the rest 
in pencil. An examiner selects a name from the list of 30: what is the 
probability that the candidate selected wrote his script in pencil and 
received more than half marks ? 

These illustrations are typical of the following result: 

Theorem. If p x is the probability that an object belongs to the 
subclass a x of the classes a v a 2 ,..., a r , and P x is the probability of 



52 ELEMENTARY THEOREMS Chap. IV 

its belonging to the subclass A x of the classes A v A 2 ,...y A s (which 
are exclusive to a v a 2 ,..., a r ), then the probability that it belongs to 
the combined class a l A 1 isp 1 P x . 

Let us set out the classes in a scheme, as follows: 


First class . 


a r 

Number of members . 

. n X ) n 2i n 2i ... 

,n r 

Probability. 

• PvPt>Pz>'" 

>Pr- 

Second class 



Number of members . 

. N Xi N 2 i N 3 ,. 

:,N e 

Probability. 

• D x ,P 2) p2,*,. 

,P.. 


Thus p x 


n i 


and P, — 




%+^ 2 +...+ n / 1 n x +n 2 +:..+n; 

Combined class . . . a i^v a x A 2 ,... i a r A 8 

Number of members . . n x N v n x N 2) ..., n r N 8 

Total number of members. {n x +n 2 -\- ...+ri r )(iV 1 4-iV r 2 +...4-iy^). 
Hence the required probability is 

_ * 1^1 _ = v p 

(» 1 +«. 2 +...+n r )(2^ 1 +-^4+"-+-^») 1 1 

This theorem, sometimes known as the Multiplication 
Theorem, may be illustrated geometrically as follows. 



Let A BCD be a square of unit side, and let DF , DO represent 
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the lengths corresponding to the probabilities p x and P v Then 
if DC and DA be subdivided equally, each as many times 
as there are members of the two classes a v a 2 ,..., a r and 
A v AA a , and rectangles are formed by drawing parallels 
to DC , DA through the extremities of these subdivisions, the 
required probability will be the ratio of the number of rectangles 
within DCHF to the number of rectangles in A BCD. Since 
the area of the latter is unity, the ratio is p x P x . The probability 
of such a combined class is referred to as that of a ‘double 
event*. We note that, if p x and p 2 are the successive proba¬ 
bilities of two individual events, the probability of the double 
event not occurring is 1— P\P 2 > 

Ex. 1. In a certain book of N pages, no page contains more than 
three errors; n x of the pages contain one error, n a contain two errors, 
and n 3 three errors. Two copies of the book are opened at any two 
given pages. Then the probability that both pages have two errors is 
n\/N 2 ; the probability that the total number of errors is 4 is 
(nin 3 +n5-hn 3 ni)/iV 2 == (2n 1 n 3 + n5)/AT 2 ; 
the probability that the total number is 5 is 

(^2 n s + w 3 n a )/JV 2 == 2 n % nJN 2 ; 

the probability that the total number is 6 is nl/N 2 ; the probability that 
the total number is at least 5 is nl/N 2 + 2n t nJN 2 ; the probability 
that the total number is not more than 4 is 1 — (n|4-2n 2 n 3 )iV 2 . 


Ex. 2. Tchebycheff' s Problem. Two integers lie within the 
range 2 to N. What is the probability that they are prime to 
one another ? 

Any number, when divided by a suspected prime factor r, 
may have a remainder 0, 1,..., r—-1; hence the probability that 

it is divisible by r is Thus the probability that both the 
r 


integers are divisible by r is i, and, therefore, the probability 
that both are not divisible by r is 1 — It follows that the 


probability that the two integers have no common prime factor 
over the whole range is 

where p is the greatest prime in the given range 2 to N. 
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If N (and therefore p) is large, we may approximate to x as 
follows. 

We suppose that x is approximately equal to the infinite 
proiuct 

where r is always prime. 

Then 




i+-+- , 

+ 3 2 + 3 4 "V" 


and since any number is either a prime or a product of primes, 
it follows on multiplying out that 


= l + _ + _+.= _.+ 
a: ^2 2 ^3 2 ^ n 2 ^ 6 1 


6 3 

Hence x = — = approximately. 

TT l 5 

Tchebycheff’s problem is sometimes stated in the form: to 
find the probability that the fraction m/n is in its lowest terms, 
m and n being any two integers. 

Note that this process does not give the value of the proba¬ 
bility (which is necessarily a proper fraction) but only an 
approximation to it. 

In the following example the actual fraction is calculated for 
the numbers between 2 and 20 and between 2 and 30. 

Thus we find that the number of pairs of numbers between 
2 and 20 , with no common factors, is 108. The total number of 
pairs is 19 0 2 =171. Applying this result to find an approxima¬ 
tion to 7 r, we have 

ft 108 12 

^5 = — = —, giving it 2 = 9-5, and it = 3-08. 

For the range 2 to 30 we find that the number of prime pairs 
is 248, while the total numbers of pairs is 29 C7 2 = 406. These 
data give 

•^5 = whence it 2 = 9-82, and it = 3-13. 
it 2 406 


t See Hobson, Plane Trigonometry, 
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Ex. 3. An interesting application of elementary probability is 
found in the work of Bunsen and Kirchhoff in connexion with the 
discovery of the presence of iron in the sun. By comparing the 
spectra of sunlight and incandescent iron vapour it was found 
that, to the degree of accuracy given by the instruments, 60 
bright lines coincided in the two spectra. Now the average 
distance between the solar lines in Kirchhoff’s map was 2 mm., 
and coincidence for his instruments implied that a line from the 
iron vapour must fall within \ mm. on either side. Thus the 
probability of casual coincidence for each of the 60 lines was 
2. |/2 = Accordingly, the probability of casual coincidence 

for all 60 lines was — . or one in a million million millions. It 
2 b0 

should be noted that in this analysis iron is defined as that 
substance which gives the above 60 lines in the spectrum. 

Similar considerations with regard to the coincidence of the 
spectra of solar, lunar, and planetary light can be used to 
decide the probability that they are all of the same origin. 

EXAMPLES ON CHAPTER IV 

[In the following examples it is to be assumed that when the phrase 
‘a coin is tossed’ is used, it is implied that the probability of the appear¬ 
ance of a head is See also Chapter V.] 

Ex. 1. What is the probability of a penny turning up heads at least 
once in n throws ? 

The probability that it turns up tails every time is —. Hence the 
probability that it shows heads at least once is 1 — ~. 

Ex. 2. If m coins are tossed and all the heads are removed, and then 
the remaining coins are tossed and the heads removed, and so on, what 
is the probability that all the coins will be removed by or before n 
tossings ? 

We may imagine all the coins tossed n times; we thus require the 
probability that each will turn up heads at least once in n tossings. 

( 1 \ m 

1— gnj . 

Ex. 3.. (Pascal’s and Fermat’s problem.) Two players, with equal 
probability of winning a point, agree to play a game for 5 points. If 
the game must not be drawn, find their respective chances of winning 
at any given stage of the game. 
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Ex. 4. An um contains 6 black and w white balls. If n balls are 
extracted together, what is the probability that a of these are white ? 

The number of ways in which n balls can be extracted is b+w C n . The 
number of sets of n balls which contain oc white balls is 

w c„.»c n _„. 

Ex. 5. A number is chosen from each of the two sets 1, 2, 3,..., 9; 
1 , 2, 3,..., 9. Show that the probability that the sum of the numbers 
should be 10 is and that their sum should be 8 is 

Ex. 6. If in selecting a number from the set 1, 2, 3,..., 9, 7 is chosen 
twice as often as 3, 3 twice as often as 5 and 9, and 5 and 9 twice as 
often as 1, 2, 4, 6, 8, what is the probability that the sum of two numbers 
selected will be 10? 

Ex. 7. A red card is removed from a pack of 52; 13 cards are then 
drawn and found to be of the same colour. Show that the odds are 
2 to 1 that the colour is black. 


Ex. 8. A set consists of n counters. What is the probability that 
a selected group of these of unspecified number consists of (1) an even 
number of counters, (2) an odd number of counters? 

We have to find the total number of members of the groups that can 
be formed of 2, 4, 6,... counters for the case (1) and of 1, 3, 5,... for the 
case (2). 

The total number of ways of forming groups of 2, 4, 6,... is respectively 
n C v n C 4 , *C 6 ,... and for forming the groups 1, 3, 5,... is 

*C V n C z , . 

Thus the number of members of the class of even groups is 


n C a +"C 4 +... = 2 n-1 — 1 (p. 45) 

and the number of members of the class of odd groups is 

n C f 1 + n C 3 +... = 2 W “ 1 , 

while the total number of members of all classes is 2 n —1. Thus the 
probability of the selected group being odd is greater than its being even. 
The difference between the two probabilities decreases as n increases. 
Ex. 9. From a pack of 52 cards an even number of cards is drawn. 
Show that the probability that these consist half of red and half of 
b ““ 


The number of ways in which an even number of cards can be drawn is 


"C 9 +"Ct+...+"C u = 2^1-1 (p. 45). 

Of these, the number of groups consisting half of red and half of black 


cards is 

u ci+*ci+. 

Hence the result. 




Ex. 10. Using Stirling’s theorem, find an approximation to this 
probability. 
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Ex. 11. A pack of 52 cards is cut twice, a card drawn and replaced; 
show that the probability of obtaining aces each time is 1/169. 

Ex. 12. A and B stand in a line with 10 other persons. What is the 
probability that there are 3 persons between A and £? What would 
be the probability if they stood in a ring ? 

Ex. 13. Find the probability that a month contains portions of six 
different weeks. 

Ex. 14. Two identical urns contain respectively n and n' balls; the 
first urn contains a white balls and the second o'. If a ball is extracted 
from one of the two urns, what is the probability that it is white ? 

It must be noticed that the extraction of a white ball from the first 
urn is the result of two circumstances: ( 1 ) the choice of this urn from 
the two identical urns, ( 2 ) the extraction of a white ball from this urn, 
supposing that it has actually been chosen. The probability of (1) is J, 

d 

that of (2) is -. Thus, the probability of extraction of a white ball 

from the urn is ^ - ; and similarly, the probability of extraction of a 
2 n 

1 a' 

white ball from the second um is - --. Hence the required probability, 

which is the sum of these two probabilities, is - + ^ 7 ). 

2 \n n / 

Ex. 16. Each of two bags contains m shillings and n sixpences. If 
a coin is drawn from each bag, show that the probability that both 
coins are shillings is greater than that of drawing two shillings from 
a bag containing all the coins. 

Ex. 16. In a card game in which the dealer’s last card determines the 
trump suit, find how many hands must be dealt in order that it is more 
likely than not that at some stage the dealer will hold all the trumps. 
Since the dealer always holds one of the trumps, the probability of 

any one deal of the required type is — -, say, where c is a large 

c 

number. 

The probability of not holding all the trumps is thus 1 — ~. After 
x deals, this probability is 

(,_!)•_(,_ip'« 

= er*l° f approximately. 

For an even chance we require e~ x I c — £. 

This equation gives x = clog 2 = 10 11 , approximately. 



CHAPTER V 


BERNOULLI'S THEOREM 

1. Bernoulli’s Theorem and its extensions 

In dealing with a class of objects or events, we shall use the 
term * population’ to describe the original class from which 
the subclasses are to be formed. 

Suppose that we are given a population of ten counters 
divided into two subclasses which we represent by four black 
counters b and six white counters w. What is the probability 
that among three unspecified members of the population just 
two are members of the subclass w ? 

We may proceed as follows. The probability that a member 
of the population is a member of w is $ = §; hence the proba¬ 
bility that two members, as a group, are members of w , is 
gxf. To satisfy our conditions, the third member must not 
belong to w ; thus the probability required would appear to be 
|XjjX(l—i). But the order in which the three members have 
been considered as belonging (or not) to the subclass w is not 
exhausted by this particular process; it could be either the 
second or the first member which is excluded from w. Thus the 
total probability is 

3xfx|x(l —!), or 3 C 2 X?X|X(1-|) = &• 

This simple problem is an illustration of the general result. 

Bernoulli’s Theoremf 

Let a population be divisible into subclasses b and w such that 
the probability of any member of the population being also a 
member of w is p. Then, of n objects defined only as members of 
the population, the probability that r of these are also members of 
w is n C r p T (\—p) n ~ r . 

For the probability of r members of the population being 
members of w is, as we have seen, p r ; the probability that the 
remaining n—r members are not members of w is (1— p) n ~ T . 
Thus the combined probability of the double event is p r ( 1 —p) n ~ T . 
But the r members of the group of n initially considered can be 

f In interpreting the probability p in the following theorem, reference should 
be made to the discussion on a priori probability on p. 19. 
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exhaustively selected in n C r ways. Then, since the total proba¬ 
bility required is the sum of the separate probabilities, it is 
equal to ^ l 


Ex. 1. Thirteen cards are drawn one by one from an ordinary pack of 52, 
each card being replaced immediately after drawing: to find the proba¬ 
bility that exactly 3 red cards are so obtained. 

There are initially 26 red and 26 black cards in the pack, so that the 
probability p that a card should be red is J. In our theorem, as applied 
to the present problem, the group of objects to be considered is in 
number n — 13, and the sub-group is in number r — 3. Hence the 

/ 1 W l \ i 3—3 140 140 1 

required probability is 13 C 3 ( 2 ) (l -<j) = = 28’ a PP rox ‘- 

mately. 

Note, in contrast, that the probability of finding 3 red cards in a hand 
of 13, as ordinarily dealt, is 2 *C 3 2 *C 10 / 52 C 13 . 


Ex. 2. What is the number of red cards, in such an extraction, for 
which the probability is greatest ? 

1 / l\ 13 ~ r 

We have to find the value of r which makes ,3 O r —^1 — -J have its 


greatest value. 

Evidently this is attained when r — 6 or 7. 


Ex. 3. What is the probability that no more than three of the cards 
should be red ? 

This is the sum of the probabilities that the number of red cards 
should bo 0, 1, 2, or 3. 


Ex. 4. Find the probability that the hand should contain at least 
three red cards. 


Ex. 5. What is the probability that, in 13 drawings, with replace¬ 
ment, an ace should be obtained four times ? 

4 1 

The original probability that a card should be an ace is — = —, 


Hence the required probability is 13 C 4 j = 0-02, approximately. 

It should be noticed that the fact that the four specified cards are to 
be aces is quite irrelevant to the problem; the same probability would 
be found for tho occurrence of any four previously indicated cards. 


From Bernoulli’s Theorem we at once derive the following: 
Theorem. If p is the initial probability that a member of a 
population should belong to a specified subclass , the probability 
that out of n members not more than r belong to this subclass is 

n C Q ( 1 —2») n +"C'j.pC 1—p)" -1 +• • •+"Cri >r ( 1 —p) n ~ r ‘ 
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With the same hypotheses we have: 

Theorem. The probability that not less than r members belong 
to the specified subclass is 

Ex. 1. Out of a population of pennies, of which half lie head up and 
half tail up, a class of 2n members is defined. What is the probability 
that these show heads m excess or defect of n, by a number t ? 

Evidently the required probability is 

in C n _ i p^ i (l—p) n+i +.,.+ tn C 1l+i p n + t (l--p) n - i 9 where p — J. 

This type of problem is usually stated in the form: A penny is tossed 
2n times. What is the probability that the deviation from n heads 
should not exceed t ? Note that in attempting ’to identify these two 
problems we tacitly assume that the sample of 2 n tossings is drawn 
from a larger hypothetical population containing precisely the same 
number of exposed heads as tails. 

Ex. 2. With the same interpretation of the terms, show that, if a 
penny is tossed n times, the probability of not more than r heads is 

i(»C 0 +”C 1 +... + »C r ). 

Applications of Mathematical Probability 

It will be observed that the language in which these theorems 
have been developed and the form in which the examples have 
been couched have been such as scrupulously to avoid all idea 
of experiment. If we are to restrict our investigations in this 
way we shall certainly avoid the error of confusing psychological 
expectation with mathematical probability; but we shall also 
lose the possibility of applying the theory to actual cases. What 
we have to discover are the circumstances in which such applica¬ 
tion is legitimate. It was pointed out previously that the study 
of psychological probability ought logically to follow in the 
wake of the mathematical investigation. At this stage, there¬ 
fore, we propose to examine briefly the restrictions hitherto 
imposed, and to see if they can be circumvented. 

It must be understood, then, that when we say that ‘a card 
is drawn from a pack’, we mean in fact that we are to discuss 
certain properties of an entity defined only as a member of the 
pack. In the same way, when we say that an individual tosses a 
penny n times, we mean that n events are under consideration 
and that each of them may belong to one of two classes, head 
or tail: that is the defining property of the event. If the result 
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is used in any particular case, the onus of justification is on the 
user who asserts that this defining property is the only relevant 
one in the circumstances in which he applies the result. 

In this connexion we may remark that in certain circum¬ 
stances it is possible to introduce into the defining properties 
conditions relating to the mode of selection or arrangement 
which enable the mathematical treatment to provide an answer 
which is closer to the facts than the result arrived at on a 
simple hypothesis. For example, suppose that there are ten 
counters in a row: five black on the left and five white on 
the right; if all that is asserted about a counter is that it belongs 
to this group, then on our definition the probability of its being 
white is If, however, we assert that an individual has selected 
a counter, then the fact that individuals more frequently choose 
with their right hand than with their left, and thus more fre¬ 
quently choose an object to the right of the centre of the group 
than to the left, will vitiate our original calculations and we must 
introduce a new factor which takes this human bias into account. 


Now suppose it is known that the choice made by an indi¬ 
vidual justifies our statement that the probabilities of choice 
of the counters, from left to right, are proportional to the 


numbers 


1, 1, 1, 1, 2; 3, 4, 4, 2, 1. 


(black) (white) 


Then the problem may be recast in the form: Given a set of 
20 counters of which 6 are black and 14 are white, the proba¬ 
bility of a white counter is Thus, by introducing ‘weighting* 
factors to represent the bias in choice of the counters, we have 
brought the original problem nearer to actuality. In the 
mathematical problem, these weighting factors must be sup¬ 
posed given; actually, they are given as a result of previous 
experiment, so that in such a problem they become known a 
priori . 


Ex. A sniper finds that, on the average, he kills once in three shots. 
He fires three times at an enemy; on the assumption that his a priori 
probability of killing is what is the probability that he kills him ? 

Here we require the probability that at least one of the shots should 
be a hit. Since p — the required probability is 


8 ^ii(f) # + 8 ^(i)*(f)+ 3 C s (i)» 


Alternatively, we may proceed as follows. 


= H- 

The probability of not 
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killing at the first attempt is f; thus, the probability of not killing in 
all three attempts is (J) 3 = -ff. Hence the probability of a hit is 

We note that there is no contradiction between this result and the 
statement that p — $, for the probability of killing with one shot only 
is $: after that the probability increases. 

Greatest Value of n C r p r (\—p) n ~ r 

To determine the value of r for which the Bernoulli proba¬ 
bility, B(n , r) = n C r p r (\ — p) n ~ r > has its greatest value, where r is 
an integer, we cannot legitimately discuss the variation of B(n, r) 
as a continuous function of r; we are not seeking fora maximum 
but for a greatest value (if it exists) in the range 0 < r < n. 

Accordingly, we require to find the value of r such that 
B(n,r—l) < B(n,r) > B(n,r-\-\), 

i.e. such that 

w CJ._iP r ~‘ 1 (l —jj) n ~ r + 1 ^ n c r p r (l—p) n - r > n C r + x p r+1 ( 1 —p) n - r ~ l . 
Cancelling out positive factors in common it follows that 
np+p > r ^ np-(l-p). 

Since p and 1— p are fractions, we thus require that r should 
be equal to np, if this number is integral, or to the smallest 
integer greater than np, if up is not integral. We thus obtain 
the following result. 

The greatest value of Bernoulli's probability B(n, r) is obtained 
by taking r to be np , or the least integer greater than np if np is not 
integral. 

Ex. How many aces are ‘most likely’ to be found in 13 successive 
drawings, followed by replacements, from a pack of 52 cards ? 


First Generalization of Bernoulli’s Theorem 
Lei a population be divisible into subclasses w v w 8 , the 

probabilities attached to the subclasses being p v p 2 ,..., p 8 - Then, 
the probability that a group of n members of the population , other¬ 
wise unspecified , should contain r x members of w v r 2 of w 2i ..., and 
r g ofw„is n] 

where r i+ r a+—+ r « = n - 

For, the probability of r x members of the population being 
members of w x is pj; of r 2 members belonging to w 2 is pj, and 
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so on. Thus the probability of the combined event is p' 1 p r * ...p£*. 
But the required probability is the sum of the ways in which 
this combination can be formed subject to the condition that 
the total number of members is n. Hence we have to multiply 
Pi P 2 ••• P r a by the number of ways in which a term of this type 
can arise by n-combinations of all such p’s; such a number is 
identical with the coefficient of pj p r * ... p£»in the expansion of 
{Pi+p 2 +- -+P 8 ) n (P* 47 )> whence the result. 

The original Bernoulli Theorem follows from this by putting 

Pi = P> P 2 = l ~P> r i = r > r 2 = n—r. 

Ex. A pack of 10 cards consists of 3 aces, 2 kings, 2 queens, and 
3 jacks. All that is known of them is that on eight successive occasions 
the cards have been shuffled and the top card each time exposed. It is 
required to find the probability that an ace will have been top card on 
two occasions, a queen on three occasions, and a jack on three occasions. 

If we denote by w v w %t w Sf and w A the respective subclasses defined by 
the aces, kings, queens, and jacks, then in the previous notation we have 
n = 8, r x — 2, r 2 — 0, r 3 = 3, r 4 = 3, 

Pi = A. Pi = A. Pi = A. Pi = A- 

Hence the required probability is 

8! / 3 \ a /2\° / 2 \ 3 / 3 \ 3 8! 27 108,864 1 

2!0!3!3! \10/ llO/ U0/ llO/ “ 10 8 “ 10 7 ‘ ~ 100* 

approximately. 

Alternative statement of Bernoulli's Theorem. The probability 
that an event with initial probability p occurs exactly r times 
in n trials is the rth term in the expansion of (p+q) n in ascend¬ 
ing powers of p, where q — 1 —p. 

It follows that the sum of the probabilities for all values of r, 
is unity. 

Again, the average value of r inn trials is 
2 n C r p r q n ~ r .r. 

r —0 

Now (p+q) n = 2 n C r p r q n ~ r . 

r«0 

Differentiating this identity with respect to p, and then putting 
p+q = 1, we have 

n = 2 w C , r rp r ~ 1 g' n ~ r , 

r = o 

np = 2 n C r p r q n ~ r ,r. 

r—0 


and thus 
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But this is also an approximation to the most probable value 
of r. It follows that to this degree of accuracy the average value 
of r is the most probable value when all possibilities are taken 
into consideration. 

Case of Probability varying from one Trial to another 

It has been assumed throughout the foregoing analysis that 
the successive stages in the withdrawal of a sample from a 
population are not accompanied by any change in the proba¬ 
bilities of its subclasses; this is the case when, for example, 
the population consists of a set of black and white balls, and 
the ball is replaced after each withdrawal, or where the popula¬ 
tion is generated by an operation, as in the tossing of a coin. If, 
however, this is not done, the proportion of black to white balls is 
altered at each stage of the process, and the initial probability of 
a black or white ball becomes a function of the number of samples. 

Ex. 1. If the probability of failing at the nth trial is 1/(1 4-n), what 
is the probability of succeeding at least once in the first m trials ? 

Ex. 2. If the probability of failure at the nth trial is l/2 n , find the 
probability of succeeding at least once in three trials. 

Second Generalization of Bernoulli’s Theorem 

Instead of referring to a population and the probability of 
its subclass, we may speak of an event and the probability of its 
success in one or more trials (corresponding, for instance, to the 
extraction of one or more white balls from an urn containing 
black and white balls). Suppose then that we consider n inde¬ 
pendent events whose probabilities of success are p v Pv,p n \ 
thus the corresponding probabilities of failure are q = 1 —p v 
q 2 = 1 — q n — 1—Then the probability of obtaining 
exactly r successes in the compound event is 

2 PiPjPk“'9l9m’“’ 

the summation extending to all products of n different symbols, 
each containing r p' s and n—r q’ s. It will be noticed that this 
is the coefficient of x r in the product 

Hence, 

The probability of obtaining r successes in a compound event, 
consisting of n independent events, is equal to the coefficient of x r 
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in (PiX+q l )(p 2 x+q i )...(p n x-\-q ll ), where p s , q a are the respective 
probabilities of success and failure in the sth event. 

Ex. Given three urns, of which the first contains 3 white and 4 black 
balls, the second contains 2 white and 3 black balls, and the third 
contains 3 white and 5 black balls, what is the probability of obtaining 
one white ball in extracting a ball from each urn? 

Evidently the required probability is the coefficient of x in 

(#* + ♦)(** + »)(»* + »). i-«. tt*. 

2. Bernoulli’s Theorem and the normal law 

Stirling's Theorem 

We have already noted (p. 44) the use that an approximate 
formula for n\ may have in evaluating probabilities. In what 
follows the use of such approximations is essential. 



0 12 3 4 


Fig. 2. 

We begin by finding an approximation, for large values of 
n 3 to logn! = logn-flog(n— l)+.--+k>g2. 

Consider the curve representing the function 

y = log x. 

If ordinates be erected at x = 1 , 2,..., n, then the sum of 
the trapezia determined by successive pairs of ordinates will be 
less than the total area between the curve, the nth ordinate, and 
the #-axis. 

4360 


F 
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Thus 


76 

J log xdx > |(logl-f log2)+|(log2 + log3)+... 

1 _ 

...+£(logrc— 1 +logw), 

i.e. [xlogx —> log2 + log3+-.+logw—|logw, 
or n\ogn — w+1 > logn!—logn*, 

or log n n e- n+1 > logn! ?i“*. 

Since the logarithms are positive we have 

n n e -n +1 > n \ n -\ i 

or n ! < n n * f *e~ n+1 . 

It is clear that to obtain a closer approximation to n\ 
require a more exact estimate of n\\n n ^er n . 

Write u n = log(7i!/rc n+ *e“ n ). Then 

(n-f-1)! 7i n +*e“ n ) 


w. 


'n+l“ 


-M, 


(n+ ljn+ie-* -1 ‘ 

_»_\ n +* 

n+i) 


“ l0 *(, 

= i+log^; 

= l-(w+i)log|l + ij 

- I -<*+»{;-Si+sr.-s«+-} 


1 

12n 2 


12w 3 40« 4 r 15n 8 


= -IS5+T) 1 >pp ro3iin “ t ' 1 y' 

12[w+l 71J 

Accordingly we may write 

u ' = A+ m’ 

where A is an unspecified constant. 


we 
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Hence we derive 

n\ 


67 


— ; — = Be lll2n , 
n n+l e -n 

where B is another unspecified constant. 

To find its value we have only to use the above approxima¬ 
tion for n\ for a particular case, say n = 9. This gives 

B = oHre 15 ’ a pp roximatel y- 

There is, however, a more general method of approach to the 
evaluation of B. We begin with the well-known formula 


sin nx = 7 tx 
When x = 1-, we have 

1 


it'-ai'-i'-K'-a 


2 

or - 

7 T 


1.3 3.5 5.7 
22 # 42 ~ * 02 ' * ’ 


l 2 .3 2 .5 2 .7 2 ... 
2 2 .4 2 .6^r 


1 2 .2 2 .3 2 4 2 
2 4 .4 4 ~. 6 4 ' 


Thus - = lim J( 2 w +H ! l 2 ... 

7T n->K> (2ra+l)(ra!) 4 2 4 ' 1 

Inserting our approximation forn!, we find that 


- = lim 

7T 7l~>00 


jB( 2 n-fl) 2 ” 4 ' ? exp(— 2n— l)exp 

f i n 2 

112(2»+1)/J 

2 4 ”( 2 ra+l) 

Bn u+ * exp(— n)exp | 

LLY 

[I2n) 

“ 4 


— lim -j-expf—4n—2+4W+ - 1 
n->oo B \ 6(2w+1) 3?ij 2 


l\ (2n+l)* n + 2 


94/1^4/1+2 


= ' lim 2 f n ±T +i = 4 lim /l + i) 4 7l + l) 2 

B 2 e~ n -*x \ 2n ) B 2 e 2 n -+ ®\ ~2n) \ 2nJ 


4 

W 


Hence B = J(2 it) and, finally, we have the approximate formula, 
for large values of n, 

nl = V(277)n"+l .expf-n+^j 

= 7 ( 2 w)» n+i e- n J^l +yLj, approximately. 
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For comparison, it may be remarked that we have already 
found that n! < n n+i e~ n .e. 


Approximate Value of Bernoulli's Probability (cas ep = \) 
Suppose that we are given a population of coins which shows 
an equal number of heads and tails; we seek the probability 
that in a large sample of 2 n coins there shall be n-\-r heads and 
n—r tails, that is, that the excess of the number of heads over 
the number of tails is 2r. In this case, the probability p of a 
head or tail is |, and Bernoulli’s probability gives us the 
formula 

P = 2n C n+r (h) n ^-h) n - r 

- (2n)! JL_ 

( n+r)l(n—r)l 2 2 ' 1 ’ 

Using Stirling’s formula we write 

(2n)\ = J(2 tt .2n)(2n) 2n e~ 2n = 2 2n + 1 n 2n +*e- 2n ^7r i 

(n+r)l = yj{27r(n+r)}(n+r) n+r e- n - r , 

(n—r)\ = >^{ 2 ^( 71 —r)}(n~r) n - r e~ n+r , 


so that 

(n-{-r)\(n—r)\ = 27re- 2n (n-{-r) n + r+ l(n—r) n ~ r +t 
= 27re~ 2n (n 2 — 




Hence 


P - 


(2n)\ 


1 


(n-f-r)! (n—r)l 2 2n 

22m +ig -2 n 


2\lire~‘ in ri in+l 


-*_•*/ _ r lY n ~ i ('‘ 

+1 \ *v \ 


n—rV 1 


\n+r/ 2 2 ' 1 

It will appear from the more general investigation on p.71 
that, when rjn is small, the approximate value of P is 


e-**!*. 
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Accordingly, P = 
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60 


yJ(TTn) 


e ~r 2 in j g ^.] ie approximate probability 


that, in a sample of magnitude 2n, there will be a discrepancy 
of r heads above (or below) n, provided rjn is small in com¬ 
parison with unity. 



Discrepancy r —► 
Size of sample - 2n 


Fia. 3. 


In a sample of size 2 n the probability that the number of 
heads will lie between n+s and n—s is therefore approximately 

r — 8 , 

P 8 = V ——where s = 0, 1, 2,..., s. 

The general variation of the term to be summed is shown in the 
figure. 

To estimate the value of P 8 we write 

r = z'Jn, 


and since the increment of r is unity, 

r -f 1 = (z+&r)Vw, 
so that 8a; = 1/Vn. Thus 



*=«/Vn 




«/Vn 

/ 


e - * 1 da;, approximately, 


-s/Vn 

«/Vn 



e~ x * dx. 


o 
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x 

The function ~ J e~ x * dx is known as the Probability Integral 
o 

or the Error Function, and is denoted by Erf x (see Appendix). 

Ex. 1. From a population containing equal numbers of boys and girls 
a sample of 1,800 is selected. Find the probability that the number of 
girls will differ from the number of boys by more than 100. 

We seek first the probability that this excess will not occur. Wo have 
2n = 1,800, 8 — 50; then tho probability that the number of girls is 
between n-f 8 — 950 and n—s— 850 is 

50/V900 5/3 

4- f e-** dx = 4" f «-** dx. 

\7T J \7T J 

0 0 

From the table (p. 197) we find that Erf(5/3) = 0-9816. 

Thus the probability that tho difference is greater than this is 
1 — 0-9816 = 0-0184, or 1-8 chances in 100. 

Ex. 2. If we define a ‘fair sample’ of size 2n of a population of coins 
as one whose discrepancy from n heads is excoeded only in 5 cases out 
of 100, what is the discrepancy allowable in a fair sample? 

Here we have to find 8 in terms of n from the definition that the 
probability of a fair sample is 5/100 = 0-05. 

«/Vn 

Thus Erf(*/Vn) = ~ J e~ x ' dx = 0-06. 

0 

From the table we find that s/ Vn = 0-044. 

Ex. 3. What should be the discrepancy such that as many cases have 
less than this as greater ? 

Here we require Erf (sf\Jn) ~ 0-5, 

whence s/Vn — 0-48. 

Thus, if 2n = 800, 8 = 9-6, i.e. the range (390, 410) should include 
about half the number of cases. 

Ex. 4. A penny is tossed 100 times, giving 45 heads and 55 tails. 
On the assumption that this is a sample of a large population containing 
equal numbers of heads and tails, find the percentage of cases in which 
a deviation at least as large as this will be expected. 

We have 2n = 100, 8 — 5, so that the probability of such cases is 

0-707 

i- ^ J e-*’ dx = 1 — 0 68262 = 0-31738. 

0 

Hence the percentage of cases is about 32. 

The General Case 

We pass now to the general case in which the probability of 
a certain subclass of a given population is p . We have already 
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shown (p. 62) that out of a sample of size n, the most probable 
number of members of the subclass is np or the least integerf 
greater than np if np is not itself an integer. We now seek the 
probability that in a large sample of size n the number r of 
members of the subclass differs by an amount x from the most 
probable number. 

The probability of just r members occurring is 



n\ 

(n—r) T 


p r (l— p) n ~ r . 


Write r = pn+x , n—r = (1 — p)n—x. Since n is large, r is large 
also, provided that x is small compared with np. 

Using Stirling’s formula and expanding in descending powers 
of n, we have 


logP = \ogn\-\-{pn-\-x)\ogp-\-{(l—p)n—x}\og{\—p)— 

— \og(pn-\-x)\— log{(l— p)n— a;}! 


Thus P = nr —^--- e ->' +a ~ 2p)x)lpil - p) , approximately. 

' J{2n(l-p)pn} Pi * 

If \x\ is much greater than |1 —2p|, we can neglect the term 

(1 — 2 p)x in comparison with a; 2 , in the exponent. We then 

obtain the approximation 


P — _ e -x*/2wj3(l-p) 

<J{2irp(l-p)n} 

to the probability that a sample of large size n will contain 
[pn]-\-x members of the subclass whose probability is p , where 
pn > \x\ > |1 — 2p\. 

This result is also valid for x = 0, for which the probability is 
a maximum; that is, [pn] is the most probable number of members 
of the subclass, and the probability that a sample of size n will 
have just this number is 

1 

V(27rp(l-p)n}‘ 

Thus, the probability that a sample of size n will have a 
number of members of the subclass lying in the range (pn—s, 
pn+tf) is the sum of the probabilities that the sample will have 


f This will be denoted by [np]. 
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precisely [pn]+$, 1,..., [pn\—s } members of the sub¬ 

class. Thus the required probability is 

y 1 f x 2 +(l -2p)z \ 

x 6l,'Ji 2lT P( l -P') n } GXI> \ 2np(l—p) j* 

We notice that 



# 2 +(l — 2 p)x\ 
2p(l-p)n ) 




x ] 

2p{\-p)n~ t -\- 


Since the summation extends to equal numbers of positive 
and negative terms, the second term in the brackets vanishes. 
Thus the probability required is approximately 


8 

1 


J{2np(l-p)nf XV { 2^(1 


x* _| 


Write y = xl*J{2np(l—p)}; then since x increases by unity, 
we have - . - ( x+ l)^{2np(l-p)}, 

1 


so that 


y+Sy = 

hy 


Vt t ^{277^(1 —p)7i}* 

The summation then takes the form 


8H{2np(l-p)} 


1 


-8W{2np(l-p)\ 


-v'ty = JL 

Vtt Vtt 


s/V{2np(l-p)} 



0 


dy , approximately. 


This result expresses the probability approximately in terms 
of the error function. 

Thus p = Erf[«/V{2«2)(l—p)}]. 

Ex. 1. If there are 32 females to 30 males in the general population, 
what would be the most probable number, ceteris paribus , of women 
students in a university population of 1,800? What is the probability 
that the number of women students will be loss than that number by 40 ? 

The probability p of an individual being a female is p — f J == £f. 

We have n = 1,800, s = 40, so that 

s/<J{2np(l — p)} — 40/30, approximately. 

From the table we find that Erf(4/3) = 0*94. Hence the probability 
that the women students exceed the men by less than 40 is very great. 
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Ex. 2. Defining a fair sample as one whose discrepancy 8 from np is 
in excess or in defect in only 10 per cent, of all cases, we have 
Erf[«/^/{2np( 1 — p)}] = 0*1. 

From the table, a = *J{2np( 1 — p)} x 0*09. 

Thus; if p — x ^, n == 5,000, we obtain 

a = V(10 4 -tVi*o)><<M)9 = 2-7. 

Also np — 500. Thus a fair sample should, on this definition, have no 
more than 503 or no less than 497 of members belonging to the subclass. 

EXAMPLES ON CHAPTER. V ' •' ' 

Ex. 1. If a penny is tossed 3 times, what is the probability of obtain¬ 
ing 2 heads ? 

Ex. 2. What is the probability of throwing an ace exactly once in 
6 throws with a die ? 

Ex. 3. If m dice are thrown, show that the probability of obtaining 
an even number of aces is £{1 -f (J) m }. 

Ex. 4. Drawings are made from a pack of 3 cards, of which 1 is red 
and 2 are black, and each time the card drawn is returned to the pack. 
If 10 such drawings are made, find the probability that n red cards will 
be chosen (n = 0, 1,..., 10), and show that it is most probable that 
n = 3. 

Ex. 5. Find the probability that in 8 throws of a die, the numbers 
1, 3, 5 turn up 2, 3, 3 times respectively. 

Ex. 6. A pack of 2 n cards, n of which are red and n black, is divided 
into two equal parts, and a card drawn from each. Find the probability 
that the cards drawn are of tho same colour, and compare with the 
probability that two cards drawn from the original pack should be of 
the same colour. 

Ex. 7. A coin is tossed m-fw times (m > n). Show that the prob¬ 
ability of at least m consecutive heads is (n-f 2)/2 w+1 . 

The required probability is the sum of the probabilities that there 
should appear exactly m, m-f 1, m-f 2,..., m-fn consecutive heads. Now 
a series of m consecutivo heads may begin at the first, second,..., (n-\- l)th 
throw; and since m > n, there cannot occur more than one such series. 
The probabilities of the first and last of these cases are evidently l/2 m+1 , 
and of the others l/2 m + 2 . Thus the probability of a series of exactly 
m consecutivo heads is 

2/2 m + l -f (n — l)/2 m+a = (n-f 3)/2™+ 2 . 

Similarly, the probability of a series of m-f 1 consecutive heads is 
(n-f 2)/2 m + 3 , and so on, up to m-fn—2. Finally, the probability of a 
series of exactly m-fn—1 consecutive heads is l/2 m+w-1 , and of m-fn 
consecutive heads is l/2 m+n . 

Hence the required probability is 

n -f 3 n -f 2 5 1_ I 

2«i+2 ' 2 W4 +3 • ••* < 2«*+» ‘ 2 ni+n ~ l ' 2 m+n * 
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The first w—1 terms of this expression form an arithmetic©-geometric 
series, the sum of which can be written downjf thus, we obtain for the 
probability the value (n-f-2)/2 m+1 . 

Ex. 8 (Pascal’s problem). A and B play a game which must be either 
lost or won. If the probability that A wins any game is p , what is the 
probability that A wins m games before B wins n ? 

Evidently, the probability that B wins any game is q — I — p. Now 
the required probability is that of A winning at least m games out of 
a series of m+n— 1, that is, by Bernoulli’s Theorem, 

m+n—lQ^pm+n—l _j_ m+n-lQ^pm+n- 2 q 4. 

+ m+n-l0 2 pm+n-3q2 4 .... + m+n-l^ 1 # 

Ex. 9. A bag contains m white and n black balls. If the balls are 
drawn out one by one, find the probability of drawing first a white and 
then a black, and so on, alternately, until all the balls remaining are of 
the same colour. 

If m balls are drawn out at once, what is the probability that these 
are white ? 

Ex. 10. Four cards are drawn from a pack of 62; find the probability 
that they are all of different suits, (a) when each card is returned to the 
pack after the draw, (b) when it is not. 

Ex. 11. Given n independent events A v A 2 ,..., A n , whose respective 
probabilities are p 19 p 2 ,..., Pn* prove that the probability that at least 
one of the events happens is ]£ Pi “ JLP 1 P 2 P 2 ••• • 

Ex. 12. With the notation of the previous example, show that the 
probability that the events A lt A 2 ,..., A r , and no more, happen is 
PiPt - Pr( 1 -Pr+i)( 1 -Pr+i)-( 1 ~Pn)- Hence find 

(i) the probability that r (and no more) of the events happen; 

(ii) the probability that r at least of the events happen. 

Ex. 13. Out of a family of n offspring consisting of two equally 
probable types, r at least of one type are just as likely to occur as not: 
find the value of r. 

The number r is determined by the equation 

( n C r +"C r+1 +... + ”C n )± = l, 


or 


, , „ , w(n-l) , , n(n-l)...(n-r+l) 

1+ "* 2 ] - Ti - 


— 2 n ~ 1 . 


If n is even, there is no solution; but if n is odd, say 2m+l, then 
r = m-f 1. 


t See Chrystal, Algebra , ch. xx, 13. 



CHAPTER VI 


EXTENSION TO CONTINUOUS DISTRIBUTIONS 
Definition 

Let P 0 P v P x P 2 ,...> Z^J^,..., P n -iP n represent a series of n 
straight lines (or ‘elements’) to which the same measures of 
length 8 have been attached. Suppose that they are joined end 
to end and that they are divided, for our purpose, into two 
classes: the first class L is to consist of those elements, l in 
number, lying to the left of P h and the second class R of those 



PL-i 


Fig. 4. 

lying to the right. Then the probability that one of the set of 
elements shall be a member of L is Z/n. We may arrive at this 
result in a different manner by inquiring what is the probability 
that a point selected anywhere in one of the elements, otherwise 
unspecified, shall lie in the class L\ since such a point must lie 
in one of the elements, the required probability is l/n. 

Z _ ZS _ length of subclass L 

n ~~ w8 length of class L+R' 

This is true no matter how many members the class and the 
subclass may contain, and however the successive elements are 
orientated with respect to one another. 

Now let us suppose that to the total length P 0 P n a measure 
a has been attached and that to P 0 P t a measure 6 has been 
attached, so that n8 = a and Z8 = 6; if 8 is rational, then so are 
a and 6 . Let us proceed to the limit, making n -> oo and 8 -> 0 . 

It follows that, if P 0 PfP n is any continuous curve such that 
a, b are the measures adopted for the arc-lengths P 0 P n and 
P 0 P l9 then the probability that a point known to lie on the 
arc P 0 P n shall lie on the arc P 0 P l is 6/a. The probability that it 
shall lie on the arc P l P n is 1—(6/a). 

If 6 and a are incommensurable (e.g. if a = V2, 6 = V2) it 
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might appear that by no process of subdivision, in which each 
element has a rational measure, could an arc P Q P l be obtained 
as the limit of a number of elementary straight lines. But, in 
fact, we may replace P by any rational point P of section in 
the element P^Pi since, on our definition of probability, it is 
immaterial where P t lies in that element; and as the number n 
of subdivisions tends to infinity, the distance PP t can be made 
to differ from zero by any assigned positive quantity. Thus the 
original proposition can be applied to irrational lengths of arc. 

Analytically, if y = f(x) is the equation to a curve passing 
through the points p»p* (having abscissae x = a v a 2 ) and if 



Fig. 5. 


Q v Q 2 are internal points of the range (with abscissae 
x — b v b 2 ), then the probability that a point known to lie on 
the arc P x P 2 shall also lie on the arc Q 1 Q 2 is 

Q* *>. 

/ ds I V{l+/'(*) 2 } dx 

Q, _ _ iL _ 

P, — a, 

jds J VU +/'(*)*}dx 

Pi a x 

where 8 is the arc-length of the curve measured from some fixed 
point. 

If M 1 N 1 N 2 M 2 are the feet of the ordinates at the four points, 
as shown, then the probability that a point known to lie in the 
range M X M 2 shall also lie in the range N X N 2 is N 1 N 2 /M 1 M 2 . 

Ex. 1. As an illustration of the above results, consider a semicircle 
of radius r, bounded by a diameter M x M % . First let us find the expecta- 



Chap. VI EXTENSION TO CONTINUOUS DISTRIBUTIONS 77 

tion of the height of the ordinate PN drawn from a point P known to 
lie on the arc M X M Z but otherwise unspecified. If C is the centre of 
the semicircle and the angle PCN is 6, then the probability that P lies 
on an elemental arc of measure rdd is rddlrrr , by definition. And since 
PN — r sin0, the expected height of the ordinate is 


IT 


/ 


r sin 6 


rd$ 


rrr 


J sin 6 dQ ^ 
o 


2 r 

7T 



Fig. 6. 


Now let us find the expected height of the ordinate PN erected at 
a point N known to lie in M x M % but otherwise unspecified. If CN = x, 
then PN — yj(r 2 —x 2 ), and the expected height of the ordinate is 

r r 

I" v '(r 2 -.r 2 )~ .= ~ J ^(r 2 — * 2 ) dx = \-rrr. 

■r 0 

Note the difference between these two results: to what is it due? 

Ex. 2. A line PQ is bisected at R. Two points S> T are known to 
lie on PQ, Find the probability that (1) they are on opposite sides of 
R , (2) they are on tho same side of R, (3) they are both to the right of R. 

Applications to Weighted Probabilities 

Questions of geometrical probability arise in which, as in the 
example previously considered (p. 61), some bias has to be 
allowed for; thus, in the above formulation of our definition, 
let us suppose quite generally that the element P 0 P x is ‘weighted* 
with a number p v that the element I\ P 2 is weighted with p 2 > --, 
and that P^Pi is weighted with p v Then the probability that 
an element of the class shall belong to L is now 

IP.iUPJlPiIl-iPr 

Z = 1 1 Z-l 

c 

Similarly, in the case of the continuous curve P Q P n , if a point 
P, whose position on the arc P 0 P n is defined by a measure s of 
arc-length, is weighted by an amount p(s) 9 where p(s) is some 
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function of 8, then the probability that P shall lie on the arc 
JFq P r is Pr Pn 

J p{s) ds / J p(s) da. 


Po 


Extension to Two Dimensions 

Suppose that a plane is divided into rectangles by lines drawn 
parallel to the coordinate axes Ox , Oy . Consider a polygon 
ABODEF bounded by sides of these rectangles, to each of 
which a measure ot of area has been attached, and suppose that 
it contains a of these rectangles. Let PQRST be a polygon 
lying within ABODEF and bounded by sides of the same 



rectangles, of which it contains 6, suppose. Then if a rectangle 
is known to be one of the class a, the probability that it shall 
also be one of the subclass b is 

b __boc __ area of polygon PQRST 
a~~ a<x~~ area of polygon ABODEF' 

We may now pass, by a discussion analogous to the preceding, 
to the following 

Theorem. If S is a simple closed curve , of area a, containing 
a simple closed curve S' of area 6, then the probability that a point 
lying in the region enclosed by S shall also lie in the region 
enclosed by S' is b/a. 

For the procedure by which this result is established we may 
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refer the reader to the usual method of obtaining the formula 
for the area enclosed by a curve. By subdividing the area 8 
into a meshwork of elementary rectangles we thus obtain for 
the required probability the formula 

JJ dxdyl jj dxdy , 

sr ' s 

in which the integrals are taken up to the boundaries 8 and S'. 



Fig. 8. 


If the problem is one of weighted probability, we suppose 
that to a point with coordinates ( x,y) situated within S } the 
weight attached is some function f(x , y) of its coordinates. Then 
the probability required is 

X, y) dxdyj JJ f(x, y) dxdy. 

Discrete and Continuous Entities 

To illustrate the passage from a problem in probability deal¬ 
ing with discrete entities to one concerning a continuous medium, 
consider the following: 

A population consists of elements forming two subclasses b 
and w in the proportion of 1 : T— 1. The number of elements 
in any sample of magnitude T is n. From this population is 
drawn a sample of total magnitude t\ we require to find the prob¬ 
ability that in this sample there is no member of the subclass 6. 

If we assume that ntjT is an integer, it follows that the 
number of elements in the sample of magnitude t is ntjT. And 
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since the proportion of b to the whole population is 1: T, the 
probability that an element in a sample of magnitude T 
belongs to 6 is ljn\ thus the probability that it does not belong 
to b is I — l/n. If we now consider the sample t, containing nt/T 
elements, the probability that none of them belongs to b is 


lytiT 


('••jK 1 -;)-(*-;)- 16 t f “ ,ors “ (*-s) 

If n is sufficiently large,f ^1 — ij is approximately e, where 


e== 1 + Ii + ^! + - = 2 ' 71828 -- 


Hence the required probability is approximately e~ tlT . 

If the original population be considered as a continuous one, 
e.g. a volume of water or an interval of time or space, then the 
number n of elements in the sample may be made arbitrarily 
large, and whatever the value of tjT, provided it is rational, we 
can always assume that nt/T is arbitrarily large and integral. 
Thus, we can assert the following : 

Theorem. If in any continuously varying process ( varying 
e.g. with respect to time, space, or volume) a certain characteristic 
is present to the extent of one in T units, then the probability that 
the characteristic does not occur in a sample of t units is e~ i,T . 

Ex. 1. It is known that 100 litres of water have been polluted with 
10 e bacteria. If 1 c.c. of water is drawn off, what is the probability 
that the sample is not polluted ? 

10 5 

Since 100 litres = 10 5 c.c., it follows that T — = 10 _1 . Also 

t = I; so that the required probability is 

e~ 10 = 0*000045, approximately. 

Ex. 2. An aircraft company carries on the average P passengers 
M miles for every passenger killed. What is the probability of a pas¬ 
senger completing a journey of m miles in safoty ? 

The fatal accidents occur once in PM passenger miles. Hence the 
probability that an accident should not occur in m given passenger 
miles is approximately 

t For example, if n = 1000, the error in replacing (i by e does not 

affect the second decimal place. 
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Ex. 3. Estimate from this result an apparently reasonable premium 
to pay in order that, should a passenger be killed on such a flight, his 
heir should receive £10,000. 

Ex. 4. If during the week-end road traffic 100 cars per hour pass 
along a certain road, each taking 1 minute to cover it, find the prob¬ 
ability that at any given instant no car will be on this road. 

Evidently no car must have entered the road during the previous 
minute. But on the average a car enters every 3 fop — 36 sec. 

Thus the required probability is c -80 / 38 — e~ 5 / 3 . 

Ex. 5. In a completed book of 540 pages 624 typographical errors 
occur. What is the probability that 4 specimen pages selected for 
advertisement are free from errors ? 

Ex. 6. A series of cars of the same length and with the same speed 
proceed along a certain road, one every T seconds; and another series 
of cars identical in length and speed with the first, proceed along a road 
meeting the first road at right angles, one car passing every T' seconds. 
If a car takes t seconds to pass an observer, find the probability that 
there should be no collisions in an interval of time t. 

By ‘collision’ we mean in this case the situation of some portions of 
two cars, at the cross-roads, at the same instant. 

The required probability is evidently the sum of the following separate 
probabilities: 

(1) the probability that no car on the first road is passing the cross¬ 
roads in the interval t , and that a car on the second road is passing 
the cross-roads in that interval; 

(2) the probability that no car on the second road is passing in the 
interval and that a car on the first road is passing; 

(3) the probability that no car passes the cross-roads on either road 
during the interval. 

Hence the probability is 

e~ t l T (l — e~ t l r ) ±e- t l T '(l—e- t l T )+e- t l T e- t l T ' = e- t l T +e- t l T '—e t i l l T+1 l r ’K 

Ex. 7. Criticize the following statements: 

(1) The sun rises once per day; hence the probability that it will not 
rise to-morrow is e~ l . 

(2) The probability that it will rise at least once is 1 — er 1 . 

The c Random Walk 9 Problem 

We begin with the simple case in one dimension. An indi¬ 
vidual is constrained to move backwards and forwards in a 
straight line, each step being of length Z, it being at each stage 
equally probable that the step will be taken forward or back¬ 
ward. We inquire what is the probability that after n steps his 
displacement will lie between a and a~\~da, where n is large. 

Let a = ml\ then clearly we have to calculate the probability 
P that out of n steps m) will be forward and \(n—m) 
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backward. The probability of each step being by Bernoulli’s 
Theorem 

n\ 1 


P = 


[|(n+m)]![|(?i—m)]! 2 n 


-3 




by applying Stirling’s approximation for large values of n. 

Thus the probability that the displacement will lie between 
a and a+da is , 

—— -~e- a 'l 2nli da. 

The mean square distance a 2 is then given byf 

+ 00 

J a 2 e -a'/2«P da = ni 2 


1 


^j{2rml 2 ) . 

or a — l^ln, and the required probability is 

_ e -a 72 o> da. 


1 


Oyj{2TT) 

We pass now to the two-dimensional case. 

A man walks a distance OO x = l x from a point O in any 
direction and then walks a distance 0 1 0 2 = l 2 in any direction; 
required the probability that the final point 0 2 falls within 
distances r x and r 2 of 0, where r 2 > r v 

Draw a circle of radius l 2 about 0 X cutting the circles of radii 
r x and r 2 about 0 at P and Q. 0 2 may fall anywhere on the 
circle with centre O v and it will satisfy the required conditions 
if it falls on the arc PQ. Hence the required probability is 


P 


= PQ _ L PO x Q 


nl 9 . 


7 T 


ir 

=5 - cos 

ttL 


.1 g±g= 


-cos 


-ig+g-jj 


]' 


2 l x l 2 2l x l 2 

Ex. 1. If l x = l t = Z, the probability that the final position lies 
between a distance r and r-\-dr from 0 is 

2 dr 
7T (4 

Ex. 2. Two points P and Q are at distance l x apart. A man walks 
from Q in a straight line to a point R which is then found to be a distance 

t See Chap. VIII. 
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l 2 from P. What is the probability that the distance QR lies between l 3 
and li ? 

What is the probability that QR lies between A and A-j-dA? 
ILLUSTRATIVE EXAMPLES 

Ex. 1. Two points are selected in a line AC of length a , so 
as to lie on opposite sides of its mid-point 0. 

Find the probability that the distance between them is less 
than \a . 

Let P and Q be the points and let OP = x , QO == y. 

We thus require 

x+V < 4 ®. 

The conditions of the problem require further that x < \a, y < \a . 



If we represent x and y by Cartesian coordinates, it is clear 
that x and y may lie anywhere within the square shown, while 
the values of x and y which satisfy the condition x-\-y < \a lie 
in the shaded area. 

(j2 I Qp 2 

Hence the required probability = — — = -. 

Ex. 2. A line of given length is divided into three parts. 
Find the probability that these will form the sides of a triangle. 

Let AH be the line, of length a, and let the three parts be 
x, y , and a—(x+y). 

Then we require #+2/ > a —( xJ ry)> 

x+a—(x+y) > y, 
y+a—{x+y) > x. 
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These conditions are equivalent to x+y > x < y < In 

Z 

any case we have the condition x+y < a. 

Hence, if we represent x and y by Cartesian coordinates, as 
before and the lines BD, AE by x+y — a, x+y — a/2, re¬ 
spectively, the required probability is evidently 


area ACE 1 



Fig.10 


Ex. 3. Find the probability that the roots of the equation 
x 2 +2px+q = 0, where — P < p < P and —Q ^ q < Q, 
should be real. 

Let p and q be represented by Cartesian coordinates, so that 
they are restricted to lie in the rectangle shown. The condition 



Fig. II. 
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for the reality of the roots is p 2 > q; thus p and q must be such 
that the point (p, q) lies on the lower side of the parabola y = x 2 . 
There are two cases to distinguish, according as P 2 < Q or 

P 2 > Q. (i) If P 2 < Q, the shaded area = 2 fy dx + 2PQ , the 

o 

integral being taken along the parabola. 


iQ 


■■ 

PHI 


■si 


-Q 

Case (ii) 
Fig. 12. 


2 p 3 

Thus the area is -~-j-2PQ, and the required probability is 
o 


therefore 


( a ?+ 2p «)/ 


+ 2P<})/4PQ = l + £l. 

Q 


(ii) If P 2 > Q , the shaded area = 4 PQ—2 j x dy 




and the probability is 

(4P<2_^i)/4P0 


W 

3 P* 


Ex. 4. If two points P, Q are taken in a circle, what is the 
probability that the circle with centre P and radius PQ will 
lie inside the original circle ? 

Let the radius of the original circle be a (Fig. 13); then the 
probability that P lies in an annulus of breadth dx at a distance 
x from the centre 0 is 2nxdx = 2xdx 

ira 2 ■ a 2 

The second circle will lie inside the first if PQ < PN , where 
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PN = a—x. Thus Q may lie anywhere within a circle with 
centre P and radius a—x. Hence the required probability is 

o o 

_ 2fa 4 ^2fl_ 4 a 4 1 _ 1 

~~ a 4 [ 2 3 + 4j “ 6‘ 




Ex. 5. Bujfon's 'problem. A smooth table is ruled with 
parallel lines at distance a apart. A needle of length l < a is 
dropped on the table. What is the probability that it will cross 
one of the lines ? 

Take one of the parallel lines for z-axis and any perpendicular 
to it for y-axis (Fig. 14). The probability that the centre of the 
needle has an ordinate lying between the limits y and y+dy is 
dyIa\ and the probability that the inclination of the needle to Oy 

dd 

should be between 0 and O+dO is —. Hence the probability 

7 T 

that the needle will cross Ox is 


dydQ 
an 9 

where the double integral is 
taken over the range of values 
of y and 0 for which the needle 
will cross Ox. The possible values 
of y are evidently given by 
\y\ < £Jcos0, and 8 lies in the 
range — \n < 6 < \n. Thus, from Fig. 15, where DEA is 
the curve y = |icos0 and AB is of length \a, the required 
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probability is area AED = J_ = U 
area A BCD \an an' 

Ex. 6. Consider the same problem in the case where l > a. 

Ex. 7. A point P is chosen on a line AB of length 2a. What is the 
probability that AP .PB should exceod A a 2 , where A is a given positive 
number ? 

Ex. 8. A point is chosen on each of two adjacent sides of a square. 
Show that the average area of the triangle formed by the sides of the 
square and the line joining the two points is one-eighth of the area of 
the square. 

Ex. 9. Three points are chosen on the circumference of a circle. What 
is the probability that they lie on the same semicircle ? 

Ex. 101 Find the probability that the equation 
x 2n+l — ( 2n-{-l)px+2nq J 

where n is a positive integer and O^p^P, -Q^q^Q, 
should have three of its roots real. 



Case (f) 
Fig. 16. 



By plotting the curves y = x 2n+1 , y — (2n+l)px+2nq, it is 
easily seen that the condition for reality of the roots isp 2n+1 ^ q 2n . 

We now represent p and q by Cartesian coordinates (x,y), 
whence, as in Ex. 3, it follows from the diagrams shown 

that the required probability is area f^, . Thus two cases 

area A BCD 


arise. In Case (i), the area 

f 2w+i 

OEF = 2 J x 2n dx — 
o 


4n 

4n+l 


«n+l 

P 2n , 
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so that the probability is 

2 n+l 

2 n P 2n 
4w+l Q 

9 2'n 

In Case (ii), the area OEF — 2PQ—2 J y 2n + 1 dy 

o 


Chap. VI 


= 2PQ- 


2(2n+ l) 

4«-fl 


4 » + l 
Qin+ 1 > 


2 n 


and the probability is therefore 1 — 


2 n +1 Q2«+1 
4n+l P 


Ex. 11. Find the probability that the solutions of the simul¬ 
taneous differential equations 


- »■ 


2(a-b)z+^+by = 0, 

where 0<a<^4,0<6<J5, represent decaying oscillations. 
Eliminating y from the equations, we obtain for x the equation 

g+2(»-.)* + M«-0. 

For a decaying oscillation we require a > 2b and (26— a) 2 < 6 2 . 
This latter condition is equivalent to (3b—a)(b—a) < 0, so 
that either 

(i) 36 < a, 6 > a, or (ii) 36 > a, 6 < a. 



If we represent a and 6 by Cartesian coordinates (x 7 y), their 
total field of variation is the rectangle bounded by the axes and 
x ?= A, y = B. For the conditions of the problem a and 6 
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must be represented by values of (x,y) which lie between 
y = 0 and y — lx (the line ON), and also between y = \x 
and y — x (the lines OL, OM ); it is clear that the condition (i) 
cannot be fulfilled. If, for example, we suppose that B > 2A, 

the required probability is evidently \(\A 2 —\A 2 )/AB == . 

1Z JLj 


Ex. 12. What is the probability that the second figure in a 
table of square roots of x is n, if x ranges from 0 to 1 and is 
tabulated at equal intervals ? 

Let the first figure for any x be m ; then for success w T e require 

Oran ^ V# < O-ra(n-fl) 

, n . , , n+l 

or m _j__ ^ lOVo; < ra+ , 


where ra may be 0, 1,..., 9. 

Thus out of the total range of x within which it falls, viz., 0-1, 
the second figure in \’x will be n if x falls in any one of the intervals 




-(20ra+2n-f 1), 

10,000 v ^ h 


corresponding to ra = 0, 1,..., 9. 

Hence the required probability is 

l m — 9 

10,000 2 (20 ’“+ 2 ”+ 1) - 

7tl — 0 

Thus for n = 0, P = 0-0091 and for n = 9, P = 0-109. 


Ex. 13. What is the probability that when log e x is tabulated 
for x — 1 to x = 0, at equal intervals of x, the second figure in 
the table will be 2 ? 


EXAMPLES ON CHAPTER VI 

Ex. 1. A defective measuring instrument slips one scale division each 
time it is used. Find the probability that after being used 100 times it 
will bo no more than 6 divisions from the zero reading. 

Ex. 2. Trains leave a station at 3, 5, 8, 10, 13,... minutes past the 
hour. Find the probability that a passenger arriving at the station has 
to wait less than a minute for a train. 

Ex. 3. A point P is chosen on a line A B. What is the probability 
that AP : PB > A? 

Ex. 4. Two points are taken in a circle. Find the probability that 
the perpendicular from the centre on the line joining them does not 
pass between them. 
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Ex. 5. On a chess-board, the squares of which are of side a, there is 
thrown a coin of diameter b , so as to lie entirely on the board, which 
includes a border of width c. Find the probability that it will lie 
entirely on one square (a> b > c). 

Ex. 6. A floor is paved with tiles, each tile being a parallelogram 
such that the distances between pairs of opposite sides are a and b 
respectively, the length of the diagonal being 1. A stick of length c falls 
on the floor parallel to this diagonal. Show that the probability that 

it will lie entirely on one tile is — |) • 

If a circle of diameter d is thrown on the floor, show that the prob¬ 
ability that it will lie on one tile is |l — “)(l“ 

Ex. 7. A sheet of perforated zinc in the form of a square 22 cm. in 
width is covered with ten rows and ten columns of holes each 1 cm. 
in diameter, the centres in the rows and the columns being evenly spaced 
at intervals of 2 cm. 

What is the probability that a particle of sand (considered as a point) 
blown against the zinc sheet will pass through to the other side ? 

What is the probability that a small shot of diameter £ cm. fired 
against the zinc without sufficient force to penetrate the metal will pass 
through one of the holes ? 

Ex. 8. A disk of wood of radius R and thickness d is cut so that it 
finally consists of four blades or sectors, each of 30°, radiating from the 
centre and evenly spaced. The disk is then set spinning with angular 
velocity w about an axis through the centre at right angles to the disk; 
a shot is fired with velocity V parallel to and at distance r < R from 
the axis. Find the probability that the shot will pass without damaging 
the blades of the disk. 

Ex. 9. A point P lies inside a circle of diameter AB. What is the 
probability 

(1) that it > LAPB > a > \t r, 

(2) that \tt > LPAB > a > 0, 
where a is a given angle ? 

Ex. 10. Three chords are drawn through the same point of a circle. 
What is the probability that all three lines cut the same semicircle? 

Ex. 11. A particle oscillates harmonically with period T between two 
points A and B distant 2a apart. What is the probability that during 
a small interval of time t the particle will be found within a small 
distance 6 of the point B ? 

Ex. 12. A raindrop falls steadily down a window-pane of total height 
H. At every distance h a grease spot deflects it by an amount d to the 
right or left. What is the probability that by the time it reaches the 
bottom it will have been deflected from its original direction of descent 
by an amount D ? 



CHAPTER VII 


THE THEORY OF ARRANGEMENTS (2) 

In the following theorems we are dealing with a series of pro¬ 
blems that can perhaps be described best in this way. Let there 
be a row of pigeon-holes into which it is proposed to place a set 
of objects which may or may not differ from one another. The 
result of the distribution may be that some pigeon-holes con¬ 
tain objects and some do not. Thus the number of ways in 
which a distribution can be effected will depend upon two 
factors: 

(1) Whether the order of the pigeon-holes, even including 
blanks, is taken into account. 

(2) Whether the order of the objects within the pigeon-holes 
is taken into account. 

The set of objects in a pigeon-hole will be called a group or a 
parcel according as the order of the objects is or is not taken into 
consideration. Unless otherwise stated, it is to be assumed 
throughout that the order of the pigeon-holes is significant. 

Suppose that we are given n different objects in a row and 
that these are divided by r —1 partitions into groups which may 
range in size from 0 to n. In how many ways can this division 
be accomplished ? Altogether, counting objects and partitions, 
we have n+r—-1 entities, and if these are permuted among them¬ 
selves we shall obtain the required number N of distributions, 
provided we make allowance for the fact that the interchange 
of two partitions does not alter the result. Thus we have to 
permute n+r—-1 objects among themselves, r— 1 of them being 
alike; so that, by the theorem (p. 42), 

N = (n+r—l)!/(r—1)! = r(r+l)...(r+ n-1). 

Hence 

Theorem I. The number of ways in which n different objects 
can be arranged in r or fewer groups is r(r+l)...(r+n—1). 

Ex. 1. Show that there are 6 ways of displaying 3 flags on 2 masts, 
when all the flags must be displayed but both masts need not be used. 

Ex. 2. Show by means of Stirling’s theorem that when n is large 
compared with r, the value of N in Theorem I is - v /(27r)n w+r "*e*" n /( r "~ 1) ! > 
approximately. 
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Now let us impose the restriction that each of the r groups 
must contain at least one object. To find the number of ways in 
which the distribution can be made, we begin by selecting r of 
the n objects and placing one in each of the r compartments; 
since the objects are all different, this selection can be made in 
n P r ways. For each such arrangement the problem now resolves 
itself into the preceding, for there remain n—r objects to be 
distributed into r or less groups. Hence the total number N of 
ways is given by 

N = n P r .r(r+l)...(r-\~n—r—l) = n\(n— -l)!/(w—r)!(r— 1)!. 
Thus, 

Theorem II. The number of ways in which n different objects 
can be arranged in exactly r groups is nl(n— l)!/(n—r)!(r— 1)!. 

We note that when n = r, this reduces to n!, as expected. 

Ex. 1. A builder has been asked to deliver 10 different consignments 
of materials on 4 successive days, at certain specified times. If he omits 
to record the details of the order in which the materials should be sent, 
what is the probability that he executes the order correctly ? 

Ex. 2. Applying Stirling’s theorem to the result of Theorem II when 
n is large compared with r, show that the approximate value of A is 
- v /(27r)n n+r ~*e~ w /(r — 1)! 

as in Theorem I, Ex. 2. 

Ex. 3. By estimating the approximations in Theorems I and II to 
a higher degree of accuracy, determine the proportion of the total 
number of ways which arise from the assumption that fewer than r 
groups may be employed. 

The last proposition can easily be generalized. If we wish to 
arrange n different objects into r groups so that each group 
contains at least s objects, we begin by selecting rs objects and 
placing s in each of the r groups. Since this selection can be 
made in n P r8 ways, we have the result: 

Theorem III. The number of ways in which n different objects 
can be arranged in r groups , each of which contains at least s 
objects , is n i^r(r+l)...(r+w— rs— 1). 

Now suppose that the n objects which we wish to arrange in 
r different groups are identical. This means, of course, that we 
are now dealing with parcels instead of groups. We begin by 
placing the objects in a row—there will then be n—1 gaps 
between them. If we indicate r—1 of these gaps we shall have 
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separated the objects into parcels, each parcel containing at 
least one object. Thus the number of ways of forming such 
parcels is the number of ways of indicating r — 1 gaps among 
the n— I, i.e. w ~ 1 (7 r _ 1 . Hence, 

Theorem IV. The number of ways in which n identical objects 
can be arranged in r different parcels is N — (n—l)!/(r—l)!(n—r)! 
Note that, by the method*)* of Theorem XII, this number can 
be obtained as the coefficient of x n ~ r in 


(x°+X 1 + ...+X n - r ) r = (l-xn-'+'YKl—xy, 
i.e. in (l—x)~ r . Thus the coefficient is w ~ 1 C r _ 1 , as before. 

Ex. 1. During a period of shortage, n tons of coal have to be dis¬ 
tributed among r factories. What is the probability that a specified 
factory is supplied with exactly m tons ? 

By Theorem IV, the total number of ways in which the n tons can 
be supplied is (n— l)!/(r— 1)! (n — r)\. 

If m tons are given to the specified factory, we have n — m tons left 
to distribute among the remaining r — 1 factories, and this distribution 
can be effected in (n — m— l)!/(r— 2)! (n — m — r-f-1)! ways. Hence the 
(r— l)(n-r)(n-r-l)...(n-r-m-f2) 


required probability is 


Thus, if 


(n— l)(n — 2)...(n — m) 
n — 10, r = 4, m = 3, the probability is 5/28. 

Ex. 2. If n is large compared with r, show that the number of arrange¬ 
ments obtained in Theorem IV is approximately n r ” 1 /(r— 1)!. 

Ex. 3. Prove that the value of r for which N is greatest is the smallest 
integer not less than \n. 


From the last theorem we can find the number of arrange¬ 
ments into r or less parcels. For the number of such arrange¬ 
ments is the number of ways in which n+r— 1 objects can be 
distributed into r parcels, each containing at least one, whence 

Theorem V. The number of ways in which n identical objects 
can be arranged in r or fewer parcels is (n-{-r—\)\j(r—\)\n\. 

Corollary. The number of ways in which n identical objects 
can be arranged in r parcels , none of which contains less than 
q objects , is n “ r « +r - 1 CJ._ 1 . 

For we place q objects in each of the r parcels, leaving n—rq 
objects to be arranged in r or less parcels. 

Ex. 1. n nuts are thrown daily into a cage containing r squirrels. If 
a squirrel to survive must have a ration of m nuts at least per day, and 
if in the struggle some get more than their share and others less, find 
the probability that a certain squirrel will survive. 

f See p. 98. 
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Ex. 2. Suppose that n = 5, r=2, g = 1, and let the five objects be 
denoted by letters a. Then the number of arrangements is evidently 
a, a 4 ; a 2 , a 8 ; a 3 , a 2 ; a 4 , a, i.e. four. 

Given n different objects we inquire in how many ways they 
can be distributed into r or less groups not necessarily using all 
the n objects. 

Suppose we select x of the objects—such a selection can be 
made in n C x ways—and then distribute these objects among 
themselves, as in Theorem I. In this way we obtain 
n C x r(r-\-\)...(r-\-x— 1) distributions; and since x may vary 
from 0 to n , the required number of ways is 


N 


= 2 n C x r(r+\)...{r+x-\) = V 
x=q x\(n — 


to! ^ {r+x— 1)! 
(r— 1)! x\(n—x)\ ' 


£ — 0 


(r-\-x— 1)! 
\(n—x)\ (r— 1)! 


Now let us form the product of the two series 

/v2 <yll 

*= 1 +*+» + - + nl + "> 


(1— x)~ r = l-j-rx-f - -^ x 2 where x < 1. 

2 ! 

The coefficient of x n in this product is 

1 1 r 1 r(r+l) . . r(r+l)...(r+n—1) 

nr (n—1)! l! + (n—"2f! 2! n\ 

„ 1 r(r—1)1 r! (r+l)!_ , 1 
“ (r— 1)! L n\ I!(n—1)*. 2!(n—-2)! ' '"J’ 

On comparing this expression with the above value of N we 
obtain the theorem: 

Theorem VI. The number of ways in which n different objects 
can be distributed into r or fewer groups , not necessarily using 
all the n objects , is the coefficient of x n in the expansion of 
n\e x (l—x)~ r . 

Ex. Thus, if n = 2, r = 2, the number of arrangements is the coeffi¬ 
cient of x 2 in 2e x (l—a?)"" 2 , i.e. in 

2(l+x+ix*+...)(l + 2x+3x*+...). 

Hence the required number is 11. As a verification we find that the 
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number of arrangements of two objects a and 6 is given by the 
scheme: 

a, 0; 6, 0; a6, 0 ; 6a, 0 ; a, 6; 

0, a; 0,6; 0,a6; 0,6a; 6, a. 

To these must bo added the arrangement (0,0) in which neither object 
is chosen. Thus the total is 11, as before. 

Theorem VII. The number of ways in which n different objects 
can be arranged in exactly r groups , not necessarily using all the 
objects , is the coefficient of x n ~ r in the expansion of n\e x (\—x)~ r . 

For we place one of the objects in each of the r groups, and 
we have then to distribute the n—r remaining objects into r or 
fewer groups, as in the last theorem. Hence also, when the 
order of the groups among themselves is disregarded, 

Theorem VIII. The number of ways in which n different 
objects can be arranged in r indifferent, groups , not necessarily 
using all the objects , is the coefficient of x n ~ r in the expansion of 

Suppose that we form n sets of letters from the set a v a 2 , 
a 3 suppose that the letter a x occurs in n x of the sets, that 
a 2 occurs in n 2 of the sets, while the number of sets containing 
a x and a 2 is n 12 . 

Then the number of sets containing a x only is n x —n 12 , the 
number containing a 2 only is n 2 —n X2 . Hence the number of 
sets containing either a x or a 2 only is n x -\-n 2 — 2n 12 , and the 
number containing at least one of a v a 2 is 

n x + n i—2n 12 +n x2 = n x +n 2 — n 12 . 

It follows that the number of sets free from a x and a 2 is 
n—{n x +n 2 )+n x2 . 

Let us consider now three letters a v a 2 , a 3 ; suppose that n 3 of 
the sets contain a 3 , that n 23 contain a 2 , a 3 , that n 31 contain 
a 3 , <*!, while n 123 contain a l9 a 2 , a 3 . 

From the preceding result it follows that the number of sets 
containing at least one of a 2 , a 3 is n 2 +n 3 —n 23 ; and the number 
containing at least one of a x a 2 , a x a z is w 12 +n 13 —w 123 . Hence 
the number of sets containing at least one of a v a 2f a 3 is 

4* (^*2 4* n 3 n 23 )—(n 12 4~ w is ^ 123 ) 

= n X “\-n 2 ~\-n 3 —(^ 12 +W234’ w '3i)4‘W 12 3» 
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Reasoning inductively in this manner we obtain the following 
general result :f 

Theorem IX. If n sets of letters formed from a v a 2 , a 3 ,... are 
such that the letter a i occurs in n i sets , the letters a iy a i occur in n {j 
sets , the letters a if a jy a k occur in n^ k sets y and so on } then the number 
of sets free from a v a 2 ,..., a r is 

r r r 

n ~ 2 n i+ 2 n ij- 2 n ijk +...±n 12 

i = l i,J=* 1 

Corollary 1. If none of the sets is free from a v a 2 ,..., a r , then 

n ~ 2 »<+ 2 n ii~ 2 n m +...±n t2 ... r = 0. 
Corollary 2. By similar reasoning it may be shown that the 
number of sets containing one only of the letters a v a 2 , a 3 ... is 

2 n i~ 2 2 n a+2 »«*-••• ■ 

If n l == n 2 = ... = say, and n 12 = n 23 = ... = N 2) and 
so on, the number of sets free from the specified letters is 

ivr— r(r ~ 2 ) ^ 3 H-...=b^ 

where r is the number of letters in question, and N — n. 

Ex. 1. If the n given sets are a v o„ o e , a x a 3 , a,a 3 , aja 4 , a l a i , a, a % a 3 , 
a x a 2 a 6 , a 3 a 4 a 5 , a 1 o 3 o 4 , and the r specified letters are then 

n — 11, r = 3, X n i = 15, 2 n ij ~ 7• n iss ~ 1* Thus the number of 
sets free from a 1 ,a 2> a 3 is 

11-16 + 7-1 = 2, 

as is immediately verified. 

Ex. 2. At a school of 1,000 children, groups were examined for defec¬ 
tive teeth, vision, and hearing, and the following results tabulated: 


Numbers examined for: 


Teeth 

180 

Eyes and teeth 

90 

Eyes, teeth, and hearing 40 

Eyes 

700 

Eyes and hearing 

170 


Hearing . 

220 

Teeth and hearing . 

80 



The records of these cases were accidentally destroyed and it was not 
known how many of the children had actually been examined. What 
is the probability that a particular child was not examined ? 

By Theorem IX, the number of children not examined is 
1,000-1,100+340-40 = 200. 

Hence the required probability is 200/1,000 — 0-2. 

f An equivalent theorem is given by Poincare, Calcul des Probability; a 
particular form will be found in Whitworth, Choice and Chance , Chap. II. 
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Ex. 3. A certain factory produces and tests 7,000 motor-cars per 
year. The possible defects are catalogued as follows: 

B — bodywork, C — chassis, E = engine, I — instruments. 
Thus BCE denotes a case of compound defect in ‘ bodywork, chassis, and 
engine’. A year’s record of defects is shown in the accompanying table: 


B = 120 
C = 150 
E = 185 
I = 200 


BC = 50 
BE = 40 
BI = 23 
CE = 55 
Cl = 35 
El = 28 


BCE = 24 
BCI = 15 
BEI = 5 
CIE = 10 


BCEI = 2 


Find the percentage of cars which pass all four tests at the first trial. 


Ex 4.f The number of ways in which a row of n objects can 
be deranged, so that no object remains in its proper place, is the 
greatest integer contained in n\/e. 

For the total number of arrangements of the objects is 
N = n\. Of these, the number of arrangements in which at 
least one object is in its proper place is N x — (n— 1)!; the 
number for which at least two objects are in their proper places 
is N 2 = (n— 2)!, and so on. 

Hence, by Theorem IX, the number of arrangements free 
from all these restrictions (i.e. for which all the objects are 
deranged) is 

»! - Y\ ( n ~ 1 )! + -“ 2 ?“ < n - 2 > ! -- 


= 4-n4 



This number is certainly an integer; the last term is ±1, so 
that if the series of terms in the brackets is replaced by e _1 , we 
merely add a fraction to the required number; whence the result. 


Ex. 5. Two shuffled packs of 52 cards are dealt by two players, each 
dealing a card simultaneously. Show that the probability that all the 
52 pairs of cards so dealt will be different is approximately 1/e. 

We may take one of the packs as specifying the order, which may be 
one of 52! arrangements. Then the number of ways in which the second 
pack may be arranged so that no card is in its proper place is 52 !/e, 
approximately. Hence the required probability is 1/e, approximately. 

The probability that identical cards will be dealt on at least 'me 
occasion is therefore 1 —(1/e). 


t This proposition is a variant of one due to Montmort (1708). 

H 


4260 
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Ex. 6. A man writes a number (not less than nine) of letters and 
their corresponding envelopes. If the letters are inserted in the en¬ 
velopes irrespective of the addresses, show that the probability that all 
the letters will go wrong is approximately 1/e. 

Theorem X. The number of ways in which n different objects 
can be arranged in r or fewer parcels is r n . 

For each of the objects can be assigned to any one of the 
r parcels in r ways, and this gives r n arrangements in all. 

Theorem XI. The number of ways in which n different objects 
can be arranged in exactly r parcels is the coefficient of x n in the 
expansion of n ! (e x — 1 ) r . 

For, by the last theorem, the number of arrangements in 
which blanks are admissible is r n . The number of arrangements 
in which one assigned blank is admissible is (r—l) n , and so on. 
Hence, by Theorem IX, the number of arrangements in which 
no blanks are admissible is 

rn- B (r_1)n+? ^ 1 ”' (r_2) "~"‘ ±!: ^" )2nTr - 

Now (e*-l Y = e' x - r -^^+ r -t ~^. 

Hence, by the exponential theorem, the coefficient of x tl in 
this expansion is 

r*_jr (r— l) n r(r— 1) (r— 2) n 

n\ 1! nl ' 2! n\ 

whence the above result. 

Theorem XII. The number of ways in which n identical objects 
can be distributed into r parcels such that no parcel contains less 
than q objects or more than q+t— 1, is the coefficient of x 71 -^ in 
the expansion of (1 —a/) r ( 1—a;) ~ r . 

It is clear that the required number is the coefficient of x n in 
the product of the r factors 

that is, in a^(l+#+# 2 +.--+^“ 1 ) r > 
or in 0 ^( 1 — x?) r /(l—x) r . 

Hence the number sought is the coefficient of x n ~ qr in 
(1—a^) r (l—a;)- r . 
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Ex. 1. A die whose faces are numbered from 1 to 6 is thrown four 
times; in how many ways can the number 8 be obtained in the four 
throws ? 

In this case we require the coefficient of x 8 in the product 
( 5 c 1 +a; a -f ...-j-a; 6 ) 4 , 

i.e. the coefficient of x 4 in the product (1 -f a;-fx 2 +£ 5 ) 4 . 

To find this coefficient we write the latter expression as (1 — x 8 ) l l( 1 — x) 4 
and, supposing that x < 1, wo expand (1—a?)“ 4 as a binomial series. 
Thus we require the coefficient of x x in 

(l_4 x 6_|_ >ti )( 1 + 4a . + 10:l .2 + 20x 3 -f 35x 4 -f...), 

1. e. 35. 

Note that the total number of possible combinations of the numbers 
1 to 6, in four throws, is the sum of all the coefficients in (x+x 2 -\- ...+a; e ) 4 , 
and this is obtained by putting x = 1; the number is therefore 6 4 . 

Ex. 2. The probability that a die which is thrown four times gives 
, _ 0 . 35 35 1 

a total of 8 is —== — ==—, approximately. 

Ex. 3. Show that the probability that the number m will be obtained 
by throwing a die r times is the coefficient of ,r m in the expansion of 
x r ( l — x 6 ) r (l — x)~ r /6 r . 

Ex. 4. Given the two sets of numbers 1, 2, 3, 4, 5; 1, 3, 5, 7, 9, find 
the probability that the sum of two numbers selected, one from each 
group, is 8. 

The number of possible pairs of numbers is 5 2 — 25; of these the 
number of pairs whoso sum is 8 is evidently 3; thus the probability 
is 3/25. 

Ex. 5. Given the three sets of numbers 1, 2, 3, 4, 5; 1, 3, 5, 7, 9; 

2, 4, 6, 8, 10, find the probability that the sum of three numbers selected, 
one from each set, should be 16. 

The number of sets whose sum is 16 is the coefficient of # 1# in the 
product (x + x 2 + ...-ba? 5 )(ar+ic 3 -f-...-4- x*)(x 2 -\- x 4 -f...-fa; 10 ), i.e. the coeffi¬ 
cient of# 12 in (l+#-f-"«+# 4 )(l4-2' ,2 “f which is 12. 

Hence the probability is 12/125. 

Ex. 6. A set of 10 cards is marked with the numbers 2, 4,..., 20. 
In how many ways can a total of 36 be found in a hand of 4 cards ? 

EXAMPLES ON CHAPTER VII 
Ex. 1. Four men arrange to meet at the ‘White Hart’ tavern in 
a certain town. It happens that there are four taverns with that name; 
show that the probability that all the men choose different taverns 
is 

Ex. 2. If n people seat themselves at a round table, show that the 
probability that two individuals are neighbours is 2/(n—1), 

Ex. 3. A pack of 52 cards is dealt out to four players; show, by 
Stirling’s Theorem, that the probability that the whole of one particular 
suit is dealt to one particular player is approximately 156/10 14 . 
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Ex. 4. Show that the probability of obtaining 14 is the same with 
3 dice as with 5. 

Ex. 5. A die is thrown 10 times; prove that the probability that every 
face appears at least once is 38,045/139,968. 

Ex. 6. A set of r consecutive numbers is selected from the numbers 
1 , 2 ,..., n; if a second set of 8 consecutive numbers is selected, what is the 
probability that it has no number in common with the first ? 

Ex. 7. Find the probability of throwing not more than 8 with 3 dice. 

Ex. 8. Show that there is a greater probability of obtaining 9 in a 
single throw with 3 dice than with 2. 

Ex. 9. There are n houses in each of which the population may vary 
from 1 to n. What is the probability that the average population per 
house is 4 ? 

Ex. 10. Show that the most probable sum to be obtained by throwing 
2n dice is 7n, and that with 2n-f 1 dice both 7n-f3 and 7n + 4 are 
equally likely. 

Ex. 11. Find the number of positive integral solutions of the equation 
x+y + z + u = 12, 
if the unknowns are to lie between 1 and 6. 

Ex. 12. Given m kinds of objects and n of each kind, show that the 
probability that m—r selected objects will be all different is 
mn+r(J r n m ~ T / mn+r C m . 


Ex. 13. If a coin is tossed 2 n times, prove that 

(i) the probability that the numbers of heads and tails obtained are 
equal for the first time at the 2nth throw is 2n C w /4 n (2n — 1); 

(ii) the probability that in 2n throws the numbers of heads and tails 
are never equal is 2n C M 4 n . 

(iii) the probability that the numbers of heads and tails have been 
equal once and only once is 2n C n /4 n . 

Ex. 14. Prove that the number of ways of obtaining the sum r with 


n dice is 


"<?1 r - 7 C„-i + n C t . 


Ex. 15. If a coin is tossed n times, show that the probability that 
there will not be a consecutive heads is the coefficient of x n in the 


expansion of 


1 14-rr-f-a; 2 -f ...-f x 0-1 


Ex. 10. If m objects be distributed among a men and b women, show 
that the probability that the number received by the men is odd, is 
{£(&+a) m -4 (b-a)™}l(b + a) m (6 > a). 


Ex. 17. Among a batch of 240 eggs, 12 are bad. The eggs are sent 
in cartons of a dozen to 20 different customers. Find the probability that 

(i) a particular customer will receive two or more bad eggs, 

(ii) two particular customers will receive two or more bad eggs, 

(iii) all the bad eggs are delivered to three customers. 

Ex. 18. In a street of 100 houses 25 are known to have defective 
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drains, 75 have broken windows, and 15 have both defective drains and 
broken windows. Show that the probability that a given house is sound 
in windows and drains is 3/20. 

Ex. 19. D x and J) 2 are two diseases such that the probability of any 
one infected with D x acquiring D 2 from an infected individual is p lt and 
the probability of any one infected with D 2 acquiring D 1 from an 
infected individual is p 2 . Suppose that the diseases cannot be acquired 
save by mutual contagion, and that n x and n 2 people infected with D x 
and D 2 respectively come to live in a town of n inhabitants, mixing 
freely with them. What is the probability that an inhabitant will be 
free from both or either of the diseases ? 

Ex. 20. A billposter has 100 placards to post in sets of 3 or 4. If the 
placards contain 10 different types of 10 each, find the probability that 
a given set of 3 will have 2 alike. 



CHAPTER VIII 


THE EMPIRICAL THEORY OF DISTRIBUTIONS 

1. Hypothetical populations and typical constants 

So far we have been concerned with probability as a mathe¬ 
matical subject of study, the category (1) of Chapter II. In 
this section we turn to the consideration of category (2), which 
concerns itself in the first place with enumerating the frequency 
of occurrence of actual events in a physical problem. Once 
again let us emphasize the difference between (1) and (2): in 
the analysis so far developed (1) has dealt with the enumeration 
of all possible arrangements that can be conceived to occur in 
any given situation; on the other hand, (2) is concerned with the 
actual events as they have occurred in circumstances akin to 
those in which the results are to be applied. The crucial question 
which has to be faced, in the use of mathematical probability in 
the theory of statistics, is how the mathematical theorems of 
(1) can legitimately be combined with the empirical data of (2) 
to enable predictions to be made about forthcoming events of 
the type (2). 

We begin with a discussion of Histograms , a pictorial arrange¬ 
ment of physical data in a form suitable for mathematical 
analysis. 

Let us suppose that 100 leaves are stripped from a tree and 
their mean widths measured; it is then found that these lie 


Width 
in inches 

No. of 
leaves 

1-0 to 11 

8 

M to 1*2 

10 

1*2 to 1*3 

15 

1*3 to 1*4 

20 

1*4 to 1-5 

18 

1-5 to 1*6 

11 

1*6 to 1*7 

7 

1*7 to 1*8 

6 

1*8 to 1*9 

3 

1*9 to 2*0 

2 


between 1 and 2 inches in the proportions shown in the table. 
To represent our data graphically we set off unit length on the 
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:r-axis, divided into tenths, and at the mid-point of each interval 
we erect an ordinate proportional to the number of leaves to 
be found in that interval. By drawing a system of horizontal 
and vertical lines as shown, we obtain a step-curve, called a 
‘histogram’. 

It is clear that, by reducing the ordinates in a certain ratio, 
the histogram can immediately be converted into a mathe¬ 
matical probability diagram; since there are 100 members of the 
population considered, the proportions belonging to the sub- 



Fig. 19 

classes (1,11), (M, 1-2),... are respectively 8/100, 10/100, etc. 
These proportions represent, in the mathematical sense, the 
probability of occurrence of the subclasses among the popula¬ 
tion of 100 leaves. 

Once more we stress the distinction between mathematical 
and empirical probability by asking two questions: 

(i) What is the mathematical probability that a leaf known 
to be a member of this population of 100 leaves has a width 
lying between 1-2 and 1-3 inches? The answer is 15/100. 

(ii) What is the ‘probability’ that yet another leaf known to 
have been stripped from the tree containing the original 100 
has a width lying between 1-2 and 1-3 inches? So far we have 
attached no significance whatever to this interpretation of 
probability. Before any step can be taken enabling us to give 
a sensible answer to this question, we require some information 
concerning the nature of the larger population from which the 
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population of 100 leaves has been drawn, or—as is sometimes 
stated—we require to know whether the latter is a ‘fair sample’ 
of the original population. The answer to the question, there¬ 
fore, cannot be divorced from the assumed criterion of the 
‘fairness’ of the sample. 


Probability Curves 

The simple laws of mathematical probability given in Chapter 
IV can be illustrated from the above diagram. For example, 
the probability that a leaf defined as a member of the popula¬ 
tion of 100 has a width lying between 11 and 1-4 inches is 


10+15+20 

100 


= sum of the probabilities that its width lies in 


the ranges (1-1,1-2), (1-2,1*3), (1-3,1*4). The probability that 
the width lies somewhere in the range (1, 2) is obviously unity. 
The probability that the leaf has a width lying in the range 
(11,1-4) is the area between the probability diagram, the x-axis, 
and the ordinates at 1*1, 1-4. 


Frequency and Probability Curves 

If through ABC...J we draw a continuous curve such that 
the area under each element of curve is equal to the area of 
the corresponding rectangle in the histogram, the curve so 
obtained is called the ‘frequency curve’; if the ordinates of this 
curve be reduced in the ratio 1 : 100, as in the formation of the 
probability diagram, we derive a 'probability curve. For this 
curve also we can state that the probability of a leaf having a 
width lying in the range (1-2,1*6), say, is measured by the 
area under the curve; that is, if y — p(x) is the equation of the 
curve, the required probability is 

1-6 

P = | p{x) dx . 

1-2 

It should be noted that we are not justified in stating 
that the probability that a leaf has a width lying in the range 

1-63 

(1*23,1*63) is J p(x) dx . It may be convenient (as we shall 

1 23 

find) for the purpose of mathematical treatment to assume 
that the probability of an individual specimen having a width 

b 

lying in the range (a, 6) is P = f p(x) dx\ but we should have 
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to justify such an assumption or, alternatively, to find some 
measure for the extent of the error involved in making it. 

Ex. 1. In the examination of 148 pods of large yellow broom, the 
frequency of seeds in a pod was found to be as follows: 

No. of seeds 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
No. of pods 0 0 1 2 6 12 12 7 7 14 16 16 14 11 9 8 6 4 2 1 

Construct the histogram and the frequency curve for this population. 

Ex. 2. A second batch of such pods was measured and the frequency 
of their lengths obtained, as follows: 


Length 

Frequency 

2-2-28 

0 

28-3-4 

1 

3*4-40 

3 

40-4-6 

20-5 

4-6-52 

11 

5-2-58 

23-5 

5-8-64 

10*5* 

6-4-70 

3 5 


Construct the frequency curve. 

Probability as a Continuous Function 

If we are to justify the above-mentioned assumption that 
a probability may be regarded as a continuous function of a 
variable in experimental practice, we are faced with what at 
first appears to be a difficult problem concerning the continuity 
of natural phenomena. We have remarked that all observations 
are obtained, at some stage or other, by the use of a measuring 
scale; and if the process of measurement is examined, it is 
found to consist in an attempt to make two marks on the scale 
coincide with two marks on the object measured. But whereas 
it is possible to make one mark on the scale coincide, to our 
satisfaction, with a mark on the object, the other mark in 
general falls somewhere between two adjacent marks on the 
scale. Even when the accuracy of the measurement is increased 
by the use of a vernier, say, invariably the reading of the scale 
division involves an estimate which is equivalent to stating that 
the mark does not fall between two scale divisions, but on one or 
other of them. There always exists a finite ‘jump’ corresponding 
to the least interval which can be measured by means of the scale. 

The same kind of restriction is implicit in any tabulated set 
of numbers, such as a table of logarithms or trigonometric 
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functions; in fact, by no set of numbers or measurements can we 
represent fully a continuous function. Two leaves out of a 
batch of 10,000 will be classed as of equal width if with our 
measuring rod we cannot detect any difference in their widths; 
nevertheless the difference, if any, between two widths, that 
might be detected by a more accurate process, may correspond 
to a finite jump which we ignore in the measurement. While, 
therefore, it is clear that all measurements obtained from 
Nature must show discontinuity and all frequency curves con¬ 
structed from them ought strictly to be histograms, it would be 
unreasonable to assert that for our purpose we must regard the 
growth of leaves, say, necessarily as a discontinuous process. 
An apparent discontinuity arises from limitations in our method 
of measurement, but it is unnecessary to import these into our 
analysis. From our standpoint the distinction between con¬ 
tinuity and discontinuity in these cases amounts to little more 
than stating that we take the area between the histogram and 
the ar-axis to be equivalent to the area under a continuous curve 
passing through the vertices of the histogram, it being supposed 
that the error so committed is small. If it is a great convenience 
for us to deal with a continuous curve rather than with a histo¬ 
gram, the loss in accuracy, even if it were perceptible, would be 
more than compensated for by the gain in power. 

The Meaning of 1 Population * 

Here the empirical data have been used for constructing a 
histogram which in its continuous form represents the mathe¬ 
matical probability curve. In passing from the former to the 
latter we are in effect constructing a hypothetical population on 
the basis of the experimental sample. It is customary to repre¬ 
sent such a continuous curve in mathematical form and then to 
assume, either explicitly or implicitly, that the form so obtained 
has a validity for a range of the variable much beyond that 
found in the given sample. This process is tantamount to 
extrapolating the population by means of a mathematical 
expression. 

In discussing the validity of an application of mathematical 
probability or statistical theory to scientific experiment, there 
are several questions that merit examination. Let us contrast, 
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in the first instance, the conduct of a physical experiment with 
the collection of botanical data, e.g. for determining the size of 
leaves on a particular type of tree. In his experiment the 
physicist is able to exercise a considerable degree of control 
over the situation; he can plan and lay out the environment; 
he can, in general, eliminate what are called ‘systematic errors’ 
or even periodic fluctuations. The consequences are twofold. 
In the first place he can state from the beginning that the 
quantity he is measuring will lie within a prescribed and com¬ 
paratively narrow range; he will know, for example, that the 
expansion of a metal rod in certain circumstances cannot be 
more than 0*5 cm. or less than 0*2 cm. This he knows on the 
basis of his past experience of scientific inquiry, and it would 
be extraordinarily rare for an experiment to be conducted 
without some such preliminary knowledge.! In the second 
place, the actual experiment which he performs narrows this 
range still further; the observations obtained show that the 
‘true readings’ are grouped within a much smaller band of 
values. Moreover, because of the fact that the experiment has 
been carefully performed and the measurements made after a 
series of delicate adjustments, the scientist is perfectly well 
aware that to multiply the number of readings merely to satisfy 
the demands of the statistician cannot possibly increase his 
accuracy—they may succeed only in encouraging him to incor¬ 
porate a number of less accurate observations in his results. 

When we consider the collection of botanical data the condi¬ 
tions are seen to be very different. The botanist has to take the 
material with which Nature provides him, largely in circum¬ 
stances over which he has no control. His data may therefore 
range over wide regions; he can, like the physicist, state in 
advance upper and lower limits within which his measurements 
will lie, but the narrow band will be much less accurately de¬ 
fined: the more observations he can collect, the greater will be 
his knowledge of the features he is studying. Whereas the 
physicist can proceed on the experimental assumption that 
there is a definite expansion of the rod to which his measure¬ 
ments are approximations, the botanist cannot assert that there 

t The far-reaching effects of any exception to this rule can be seen from the 
consequences of the Michelson-Morley experiment. 
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is a definite size of leaf, the ‘true’ size, to which his collection 
approximates. One of the purposes of his experiment is in fact 
to discover whether he can usefully apply such a fiction to his 
subject-matter. 

In the light of the above facts concerning experimental 
practice in physics, it must be admitted that in many cases 
there is no justification for the assertion that the limited set of 
data obtained by an experimenter are a sample of a hypothetical 
population or a much wider collection.| The position is different 
when we are dealing with biological phenomena of the type 
mentioned, for here the actual collection of data has to be seen 
as a step towards the building up of the hypothetical population, 
with its special conception of a ‘true’ value. This makes the 
application of statistical theory to physical experiment a much 
more delicate and uncertain procedure than to biological, 
meteorological, or economic phenomena. 

The type of collection or hypothetical population which we 
have had in mind is a static unchanging one. But such is by 
no means the only possible type. In the paper referred to above, 
Campbell illustrates the difficulty of assigning two different 
samples to different collections by considering the rainfall 
records of 1901-20. 'Was the climate between 1901 and 1910’, 
he asks, ‘different from that between 1911 and 1920? If this 
problem is statistical, the records for 1901-10 and for 1911-20 
must be samples of two possibly different collections. But 
what are the remainders of these collections? Not the records 
for other years; for, if the climate may be changing, other years 
are not comparable. But meteorological records must be records 
for some defined period. If the records for 1901-10 are a mere 
sample of the records for some longer period, and not the whole 
collection relevant to the problem, what is this longer period?’ 

The answer to these conundrums surely lies in the fact that 
the climate of a country is itself a varying phenomenon and 
therefore the two records for 1901-10 and 1911-20 must be 
regarded as successive samples of a varying hypothetical popu¬ 
lation. Whether these samples provide data adequate for the 
drawing of valid conclusions about climatic changes as a whole 
is another matter. All we wish to point out is, that unless the 
t Cf. N. Campbell, Proc. Phya. Soc. 47 (1935), 800. 
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records in question be regarded as successive samples of a 
varying population, inconsistences of the type indicated by 
Campbell are bound to arise. 

But we must not over-estimate the importance of such matters 
in experimental practice; we shall certainly do so if we imagine 
that all experiment is necessarily individual. When scientific 
method demands that a particular conclusion shall be accepted 
only if it is accorded general assent, this should mean not only 
that the experiment which led to it is ‘accepted’ as from one 
research worker and that it can be imagined repeated if neces¬ 
sary, but that it is in fact repeated by a number of other 
workers. Thus, many measurements have been made of the 
velocity of light, by different observers working under diverse 
conditions or by the same observer using a variety of methods. 
For the final conclusion to be acceptable, the collection of 
data has to be regarded as a ‘ fair sample ’ of what scientists 
who perform the experiment are likely to find. On the 
other hand, the search for a true scientific entity would be 
fruitless unless all the numbers obtained could be regarded as 
clustering about, some so-called ‘true value’. The set of observa¬ 
tions so found therefore embody a series of diverse conditions 
of experiment which are necessarily unspeeifiable in detail; and 
in essential contrast with the case of the individual experi¬ 
menter, the larger the amount of such observations, the greater 
the precision with which the true value can be stated. For 
this reason it is of vital importance that the mass of data found 
by different observers should form a coherent collection; they 
have to be unified, and the unifying process which attempts to 
cancel out the numerous irrelevant circumstances is essentially 
a statistical one. As we have remarked, each experimenter 
will be able to state at the beginning that the quantity he pro¬ 
poses to measure will lie within a prescribed, comparatively 
narrow range; the fact that this range is practically identical 
for all the observers is merely evidence that they all begin with 
the same basic knowledge of the problem. The narrower range 
which emerges in each experiment will reflect among other 
things the diverse conditions of the individual experiment, and 
it is these ranges that have to be dealt with in a statistical 
manner. In disagreement, therefore, with the point of view 
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put by Campbell,f we hold that a statistical approach to 
observational data derived from different observers (or from 
the same observer working under different conditions) is in¬ 
escapable and is in fact fundamental in the development of 
science itself. 

Following up this idea, we shall seek to discover what are 
the most suitable probability functions which can be utilized 
in practical cases, as they occur. We are then justified in assum¬ 
ing that the probability of occurrence of a variable in the range 
(a,j8) is not only to be obtained by computing the area between 
the histogram and the z-axis, for that range, but by evaluating 
p 

J p(x) dx 9 where y = p(x) is now a continuous curve passing 

CK 

through or near the vertices of the histogram. 

Typical Constants 

For experimental purposes, and particularly for the construc¬ 
tion of hypothetical populations, it is inconvenient to handle 
a mass of detailed data. It is therefore necessary to examine 
whether certain characteristics of the data may suffice for the 
purpose in view. We pose the general problem as follows: 
Given a set of numbers a v a 2 ,..., a n , can we find a single number 
which can be regarded as a measure typical of the set? Thus, 
a v a 2 ,... may be the numbers obtained in measuring a desk (as 
in Chapter II), and we may inquire, can we find a single number 
which can be regarded as typical and which can be referred to, 
for our purposes, as the length? 

If we desire to specify the set a v a 2 ,... even more precisely 
than is possible by using a single number, a second problem 
arises, namely, how closely are the members of the set packed 
or distributed about the ‘typical’ member? We shall, of course, 
have to make precise the meaning of the word ‘typical’ in the 
given context. That this second problem is closely connected 
with the concept of frequency is seen if we state it in this way: 
How frequently do the measured members of the set fall into 
the successive ranges of, say, 01, measured from a ‘typical’ 
member? These two questions require to be answered very 
precisely before further steps can be taken to handle a set of 


t Loc. cit. £08. 
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data adequately in terms of what may be called its typical 
constants. 

What characteristic shall we expect our first typical constant 
to possess? If it were a large positive number, the differences 
between it and the actual readings would be large also; similarly 
if it were large and negative. There should be a typical constant 
lying somewhere between these two extremes, such that the 
sum of the differences, taken positively, has a smallest value; 
it would be a number about which the set as a, whole is most 
closely packed, in accordance with the requirements we have 
already indicated. This suggests either that the sum of the 
absolute values of the differences between it and the actual 
readings should be a minimum, or that the sum of the even 
powers of these differences should be a minimum. Each of 
these suggestions would give us a typical constant upon which 
to base our discussion. 

Let us illustrate by a problem. Consider the set of numbers 
2, 7, 5, 15, 10, 4; take any number x and write down the 
differences x—2 , x— 7, etc., some of which may be positive 
and some negative. The sum of the squares of these differences 

18 (x— 2) 2 +(*— 7) 2 +... + (x— 4) 2 = y, say. 


If we plot the values of y against x we obtain a parabola w hose 


minimum ordinate occurs at x — 


2+7 + ... + 4 
6 


= the average 


of the given numbers. 

This minimum ordinate thus represents the least value of 
the sum of the squares of the deviations of x from the given 
numbers; it is attained when x has the 'average value’. If we 
define the typical constant in this case as that value of x which 
makes the sum of the squares a minimum, then we find it by 
taking the average of the given numbers. 

The proposition is true in general. Thus, let a v cr 2 ,..., a n be a 
set of numbers, of which x is the typical value. The sum of the 
squares of the deviations is 


y = (x—atf+fr—a 2 ) 2 +... + (x-a n ) 2 . 

This attains its minimum value when dyjdx — 0, i.e. when 
(; x-a 1 )+(x-a 2 )+...+(x-a n ) = 0, 
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so that the required value of x is ( 04 -f a 2 +...+a n )jn, the average 
value. 

The minimum value divided by n is called the square of the 
standard deviation 0 : or if a is the average value of a v a 2 ,..., a n , 
we have 

a = *J{(a—a 1 ) 2 +(a—a t ) 2 +...+(a—a n ) 2 }l>/n. 

[Note upon ‘ average ’ and 'mean' 

If a train travelling between two stations changes its speed 
steadily from 40 to 50 miles per hour, its average speed is 45 
miles per hour. If the passengers in the train have heights 
varying from 5 ft. 5 in. to 6 ft. 1 in., they may have an average 
height of, say, 5 ft. 8 in. In the first case it is legitimate to 
assume that at some point in the journey the train has actually 
been travelling at 45 miles per hour; in the second it does not 
follow that any one of the passengers has a height of 5 ft. 8 in. 
—if we refer to it as a height, it is a fictitious one. 

It is usual to apply the terms ‘average’ and (arithmetic) 
‘mean’ indiscriminately to these two cases; but since a real dis¬ 
tinction exists between them it would perhaps be worth while, 
for the sake of clarity, to say that the mean speed of the train 
is 45 miles per hour, while the average height of the passengers 
is 5 ft. 8 in. A member of the class would then occupy the 
position of the mean, but there need be no member of the class 
which possesses the ‘average’ characteristic.] 

We remark that a and a are both ‘typical’ constants, although 
the former has been found in the attempt to discover the latter. 
Since a 2 is the mean value of the squares of the deviations of 
each member of a v a 2 ,..., a n from its average, a (the ‘root 
mean square’) gives us an overall measure of the deviation of 
the set from the average a, without reference to sign. 

There are two other features of the set which are sometimes 
found useful. Suppose that a frequency diagram has been con¬ 
structed in which the ordinates represent the number of read¬ 
ings lying in successive intervals. The interval in which the 
ordinate attains its maximum clearly corresponds to the most 
frequent or ‘most fashionable’ value of x among the set. This 
value is called the mode ; in general it is not identical with the 
average or mean, but it will be if the frequency curve is sym- 
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metrical about the mean value. A frequency curve may have 
more than one mode; but we are here concerned only with 
cases in which a single mode exists. 

Again, we may arrange our data in ascending order of 
magnitude and divide them into two sections half-way, so 
that as many measurements lie above this division as below 
it. This position is called the median and is such that the 
probability of any member of the set lying below (or above) 
it is 

Measure of the significance of a 

The magnitude of o alone may not, of course, provide us 
with all the information we may desire, even when it is asso¬ 
ciated with the average value a. Asa next step we may inquire 
how many of our readings lie within the range ( — cr, a) about a , 
and how many outside; or, as we may ask, what is the proba¬ 
bility that a member of the set deviates from a by more than a ? 



The answer to this question may be found at once from a know¬ 
ledge of the average, the standard deviation, and a histogram 
or a graph of the frequency curve. In the frequency diagram 
we erect ordinates at the points x — a, x = the number 

of observations which fall within the range indicated, divided 
by the total number, is a measure of the probability of the sub¬ 
class whose deviations from the average are less in absolute 
value than the standard deviation. In accordance, therefore, 
with our definition, this determines the probability that any 
individual observation, as a member of the hypothetical popula¬ 
tion specified by the continuous curve, has a deviation less 

4260 T 
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than < 7 ; if this probability is ‘high’, the set is ‘closely packed’ 
about the average. We note that ‘high’ is here a matter of 
judgement. 

If p(x) is the probability function in the given case, then the 

a + a 

required probability is evidently J p(x) dx. It is usual to take 

a—a 

the origin of coordinates at x = a, since many frequency curves 
are symmetrical about the ordinate erected there. If P(x) is the 
transformed probability function, the probability is now 

J P(x) dx. 

— or 

An alternative constant associated with the distribution is 
suggested by the question: for what deviation from the average 
is it equally probable that an observation will fall within, as 
without, the range? Analytically, we inquire for what devia¬ 
tion A is x 

f P(x) dx = l 
-A 

In any given case, the value of A can be determined by actual 
enumeration or, if the hypothetical frequency curve has been 
constructed, by any method for evaluating areas. The value of 
A so defined is called the ‘probable error’ (a misnomer if by 
that term we are led to conceive of it as the most probable 
error). If a deviation from the average is indeed to be regarded 
as an ‘error’, as though the average were the ‘truth’,f then 
every error has its appropriate probability. In the case where 
the deviation is ia we take the probability to measure the 
extent to which the observations are packed about the average; 
in the case where the error is A, the probability is J. 

Thus we have been led to a succession of typical constants in 
the attempt to specify a distribution. These are 

(1) the average a ; 

(2) the standard deviation a about the average; 

(3) the probability p that an observation has a deviation 
from the average of less than the standard deviation; 

(4) the probable error. 

Which of the constants will suffice in any given case depends 
on their magnitude, our judgement of what their magnitude 
■f See note on ‘average’ and ‘mean*, p. 112. 
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implies, and the purpose for which the data are to be used. If 
the standard deviation is small, then the average itself may 
suffice; if the probability p is great (i.e. in the neighbourhood 
of 1), then a and a may suffice. If not, the ‘probable error’ gives 
us some further indication of the extent to which the distribu¬ 
tion curve is dispersed about the average. We shall analyse 
these circumstances in greater detail when we come to study 
particular forms of probability curves. 

Definition of Weights 

If x v x 2 ,..., x n are a se ^ °f observations such that x x occurs p x 
times, x 2 occurs p 2 times, ..., and x n , p n times, then the total 
number of observations present is 

Pl+P2+-+Pn = 2P- 

The sum of the observations is 

J»l*l+J , 2*2+-+l>n*«» = Ipx- 

Thus the average a = 2 V x \ Z V- 

The numbers p v P 2 , ->P n are called the weights of the observa¬ 
tions x v x 2 ,..., x n . It is clear from the formula that all the 
weights may be multiplied by the same arbitrary constant 
without affecting the value of the average. If at points whose 
abscissae are x — x v x 2 ,..., x n , ordinates of lengths p l9 p 2 ,..., p n 
are erected, the diagram so obtained is a histogram, as we have 
already seen. 

Typical constants for a continuous distribution 

All the constants so far defined are relevant to actual experi¬ 
mental data. We have seen that when we proceed to replace 
the histogram by a continuous probability curve we are in 
effect postulating a hypothetical population. We proceed 
now, therefore, to the derivation of analogous constants for the 
latter. Let the equation of the hypothetical probability curve 
be y = p(x)\ we now seek a typical constant a by forming the 
sum of the squares of the deviations of x from this constant. 
Thus, since a deviation x—a in the interval dx occurs p(x) 
times, the sum of the squares of the deviations is represented by 
p 

I = J (x— a) 2 p(x) dx, where a and j3 specify the range of the 

a 

probability curve. We wish to find the value of a, if any, for 
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which this integral is a minimum. We have 
P 

J = J (x—a) 2 p(x) dx 


p p P 

— J x 2 p(x) dx —2a j xp(x) dx -fa 2 J p(x) dx. 

a a a 

Now I will be a maximum or minimum with respect to a if 
dl/da = 0, that is, if 

P^ _ P 

—-2 J xp(x) dx -f 2a J p(x) dx = 0.. 

a a 


Thus 



p(x) dx , 


giving a value for a which obviously corresponds to the average 
of a set of observations when the number of such observations 
is finite. 

That the value of a so found makes / a minimum follows 


d 2 I > 

from the fact that = 2 J p(x) dx , which is necessarily posi¬ 


tive since p(x) is everywhere positive. 

We have thus obtained an extended concept of an average; 
by analogy, the standard deviation a for the hypothetical 
population is defined by the relation 


a 2 = | (az— a) 2 p(x) dxj J p(x) dx 


| J x 2 p(x) dx — 2a J xp(x) dx -fa 2 J p(#) dx J j J p(#) dx 


= J dxj J p(#) da; — / J a;p(ar) dxj J p(#) dx\ , 

a a 'a ol ' 


in virtue of the expression found for a. 

By calculating a in any given case we can once again estimate 
the probability that any number in the range will differ 
from the average by less than a. 


Ex. Suppose that a hypothetical population ranges in magnitude 
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from 0 to 2 with a frequency which between 0 and 1 is given by 
p(x) — x , and between 1 and 2 by p(x) = 2— x. 

2 12 

Then J xp(x) dx = J x 2 dx -f j x(2—x) dx = 1. 

o 0 1 

2 

Also J p(x) dx = 1. 

o 

Hence the average a = 1. 

The standard deviation o is given by 
2 .2 

a 2 = J x 2 p(x) dxH p(x) dx —a 2 — J. 
o ' o 


Hence 


cr — 


V6* 


The probability that a member of the set between 0 and 2 will differ 
from the average by an amount 1 / V6 is 

j xdx + | (2-x)dx = 2V6_1 


UTU 

/ ?>(^) 


dx 


6 


V6 


Tchebycheff’s Theorem 

Let x 2 ,...,x n be a set of n numbers; their mean x and 
standard deviation a are then given by 

n 

«* = 1 x n> (1) 

1 

no 2 = J (x„—f) 2 . (2) 

If A is any positive proper fraction it follows that not more 
than Xht of the x’s can deviate by more than a/A from x. For 
suppose that Xht of them deviated to this extent at least from 
x; then the sum of the squares of their deviations would exceed 

A*n^j = na 2 , 

which is a contradiction of (2). 

It follows that whatever the nature of the distribution, the 
proportion of x’s deviating from the mean by more than 

2 a is less than 


3(7 




4(7 


»> 


yy 

yy 


h 

A- 



118 EMPIRICAL THEORY OF DISTRIBUTIONS Chap.VIII,§l 

These figures provide an upper limit to the probability that 
a member of a set deviates from the mean by more than a given 
multiple of the standard deviation. For any given distribution 
this probability is, of course, easily calculated. 

2. The Gaussian Law 

In discussing the specification of typical constants we pro¬ 
ceeded from the assumption, unwarranted except for its general 
plausibility, that one of these constants is such that the sum 
of the squares of the deviations of observations from it should 
be a minimum. We propose in this section to carry the problem 
of typical constants a stage further, by an elementary study of 
a number of hypothetical populations. 

Let y — <f>(x) be the equation to a probability curve giving the 
probability of an observation of measure x; we shall suppose that 
all the measurements which might be made in the given case, 
by a particular process, conform to this law of probability. Let 
x v # 2 ,..., x n be a set of n of these measurements. Suppose that 
we move the origin of coordinates to a point on the ar-axis, distant 
a from the present origin, where a is to be specified; the equation 
of the probability curve is now y = <£(£), where f = x—a , and 
the deviations of the given observations from a are 

£l = *1—0, & = X 2~ a > •••> fn = X n - a • 

The probabilities that deviations g v | 2 ,..., f n separately 
occur are ^(£x), <£(£ 2 ),..., Hence the compound probability 

that out of all the possible deviations that might occur when 
n observations are made, precisely this combination arises, is 
the product p = (&)...#£,). 

We shall define the typical constant a to be such as to make 
the probability of precisely •••$(£«) occurring, greater 

than that for any other value of the constant.f That P attains 
its greatest value for some value of a does not necessarily mean 
that it attains a mathematical maximum, if we restrict a to lie 
in the range of the observations x v z 2 ,..., x n . We may suppose 
that a and £ lf £ 2 ,..., £ n vary continuously over this range, 
but even then P is not necessarily a continuous function of a. 

f This principle, in extended form, is applied in Chapter IX for the determi¬ 
nation of hypothetical populations in general. 
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(We have already seen that a probability curve, for a set of 
given observations, is not necessarily continuous.) Accordingly 
we make the following additional assumptions: 

(1) P, regarded as a function of f 2 ,•••> £n an( l is continu¬ 

ous and differentiable over the whole range. 

(2) There is a single greatest value of P in the range which is 
also a maximum of the function. 

(3) The value of a which makes P a maximum is the average 
value already determined. 

We shall have to examine whether these assumptions can be 
fulfilled; certainly they impose restrictions on the nature of the 
probability function which will be reflected in the form eventu¬ 
ally found for it. Whether they are such as to make the results 
inapplicable in practice is at the moment an open question. 

Differentiating the function P logarithmically, we obtain the 
first condition for a maximum, 


Mi)d£ j, ,4>'(Md€ £ = 0 

<Mfi) <*« </>(£„) da 

But since = x x — a, £ 2 ~ x 2 —a, ..., = x n —a, we have 


djj\ = <LU 
da da 


= dj 3= _ l 


Hence 


Mi) , VM | 
tdx) ■*" «&) 


da 


<HU 


o. 


Since, by hypothesis, a is the average of x x , x.x n , 
£l+»2+-" + f/» — b. 


( 1 ) 


( 2 ) 


(3) 


Combining (2) and (3) we have 



where A is any constant. Since are subject only to 

the relation (3), it follows from this equation that 


These equations are all particular cases of the equation 


M) 


= m. 


the integral of which is <f>(£) = Ae^‘ 12 . Apart from the con- 
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stants A and A, this determines the general form of the 
probability function which we have been led to by our 
assumptions. 

It remains to determine whether we can choose A so as to 
make P a maximum and not a minimum. Since P will be a 
maximum when logP is a maximum, P must satisfy the 
further condition that 
d 2 

— (logP) < 0, for the specified value of a. 


Now this requires that 

2 ( m* “ ) 

where <£(f) = Ae^' 12 , </>'(£) = Af<£(£), 

+'(€) = W (f)+W)« 

Thus the condition reduces to 


n 

2A<0, 

1 

so that A must be negative. We then write 

m = 

Since <£(£) is a probability function its total integral over the 
range of variation of £ must be unity. Evidently our assump¬ 
tions have led us to a function which does not vanish outside a 
finite range for but which admits the possibility of observa¬ 
tions differing from the average by any number, however great. 
Clearly this result is a violation of the most elementary practice 
in observational work and is thus a measure of the extent to 
which assumptions (1) and (2) lead to hypothetical populations 
that are not consistent with practice. We shall discuss these 
limitations later; for the moment we use the fact that the range 
of the variable £ must be taken as extending from — oo to 
+ oo. Hence we have 

j Ae-W d£ = l, 

— oo 

from which it follows that A = A/vW (p. 123). 

Thus <!>{£) — (A/vVr)e~ A, £‘, the Gaussian error law. 
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Alternative Derivation of the Gaussian Law 

Another method of obtaining the Gaussian error law rests 
on assumptions of a different character. Let us seek a proba¬ 
bility distribution in two dimensions which is a function of 
the radius vector only; that is, if x and y are the Cartesian co¬ 
ordinates, the required function is of the form <f>(r ), where 
r 2 = * 2 + 2 , 2 . 

,-X' 


X 


Fig. 21 

Let P be a point {x, y) distant r from the origin O , and 
situated at the centre of a small square abdc of side a which is 
formed by drawing parallels to the axes through the points 
A BCD. 

The probability that a point will lie in the annulus defined 
by two circles with centre at 0 and radii r, r+dr is <f>(r)dr. 
Thus the probability that a point (x, y) will lie in the interval 
AB is <f>(x)oc\ similarly, the probability that it will lie in the 
interval CD is <f>{y)oc. 

We now assume that the probability that a point will lie 
inside the square abdc is the compound probability arising from 
these two independent events, i.e. <f)(x)(f>(y)oc 2 . This result must 
remain unaltered if the axes OX, OY are rotated into the 
positions OX', OY* . If we construct a small square of side oc, 
as in the previous case, we thus obtain 

mm* = 

If the axes OX', OY' are so chosen that OX' passes through 
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P(x,y) 9 we have x ' — *J(x 2 '\-y 2 ) 9 y ' = 0. Thus the equation to 
be satisfied by the function is 

<f>(x)<f>(y) = #V(**+yW(°)- 

Assuming that ^ is differentiable and differentiating first 
with respect to x and then with respect to y 9 we have 

mtb/) = ^pj^W+y 2 )^ 0 ). 

tWiy) = ^—^'UW+y^m- 

Hence y<f>’(x)<f>(y) = x<j>{x)<f>’{y), or 

f (?) _ f(y) 

x<l>(x) y<f>(y) 

Since x and y are independent variables, this equality can hold 
only if both terms are constant; thus 

£to = A- - a 

x<f>(x) ’ y<f>(y) 

At 2 

Hence log<£(#) = _ -f- B, 

Zt 

or <f>(x) = CeP*\ 

where C and D are arbitrary constants. We have thus deter¬ 
mined the nature of the function </>. We have still to insert the 
condition that the total area between the probability curve and 
the axis of X is unity. 

Before doing so let us notice one consequence of our assump¬ 
tion that the probability of a point P falling inside the square 
abdc is equal to <f>(x)<f>(y)oc 2 . It is clear that no probability 
function which was zero outside a circle of finite radius B could 
satisfy this condition, since there exist points lying outside 
this circle which have x - or y-coordinates of magnitude less 
than jR; thus we should require the product of two finite 
quantities to be zero. It follows that our assumption cannot 
apply to a continuous function <f>(r) which vanishes for values 
of r > R. In fact <f>(x) must be finite for all finite values of x , as 
follows also from the result 

<f>(x) = Ce Dx \ 
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By choosing D to be negative we can, however, make <f>(x) 
decrease rapidly as x increases. We write 

<f>(x) = Ce- h '*\ 

where C and h are unspecified real numbers. We now apply the 
condition that the area between the probability curve and the 
a;-axis is unity; since the range of x is ( — 00 , 00 ) we thus obtain 


Hence 

1 


J (f>(x) dx — 1. 

— 00 

00 00 00 

= C J dx=2C j e-v*' dx = ~j e-*’ dz, 


— 00 0 

where we have written hx = z. 
It can be shown that 


Thus 


r t . . Vtt 

) e dz= -2‘ 

0 

C4n _ n h 
= 1, or C = —. 

ft X7T 


Finally, therefore, we have 


<Hz) = -r e 

X7T 


-h'x • 


Consider the expression 


—00 

where hx = z. Now 


00 00 

f A e~ h ' x 'x 2 dx = -~- r - f e-‘\ 

J VtT h 2 ^7T J 


' z 2 dz , 


J e~ z 'z 2 dz = J ze“*‘z dz = £ — ^ J e~ z ' dz — 


Thus , J x 2 <f>(x) dx = 

— 00 

We have seen that if we have a set of observations such that 
x is the difference of an observation from the average, and 
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if <f>(x) is the frequency with which x occurs in the set, then 
00 

J x 2 <f>(x) dx is an approximation to a 2 , the square of the standard 
— 00 

deviation. It follows that 


1 


or h = 


—, ui — —jg approximately. 

The probability function then takes the form 

1 




- x*l2o • 


where a is the standard deviation of the hypothetical population. 
The Error Function 

The Gaussian probability curve, given by y — -j- e ~ h ' x *, is 

4tT 

shown roughly in the accompanying diagram. 



It has a maximum at x = 0, of amount h\4 rr, and points 
of inflexion, found by writing d 2 y/dx 2 = 0, at the points 
x = ±1 /AV2. We have already shown above that 1/AV2 = a , 
a quantity which, for the Gaussian law, corresponds to the 
standard deviation a for a finite set of observations. 

It is clear that the greater the value of h 9 the more closely 
does the curve lie to the z-axis; thus it is suggested that the 
constant h is associated with the precision of any set of data 
which might conform to the Gaussian law. 
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The probability that a variable will have a deviation between 

x and x-\-dx is A e~ h * xt dx ; thus the probability that a deviation 

V7T 


ft 

will lie in the range (a, 6) is - 1 - 

V7T 


b 

J e~ h ' xi dx. The probability that 


a 


a variable will have a deviation between — 1/AV2 and 1//&V2, 
the positions of the inflexions, is 


V 


A 

vw 


1 


i 

V2 


j* e~ hlx ' dx = A J e~^ dtf, where t = hx. 


i 

"7W2 


Evaluation of the Error Function 
Because of its importance for probabilities whose frequencies 

rr 

are given by the Gaussian law, the error function -j- I e~ x ' dx has 

\ 7T J 
0 

been studied in detail and tabulated (see Appendix). Various 
methods have been adopted for this purpose. We notice, in the 
first place, that a particular value of the function is known, 

oo 

for J e~ xX dx = KV. Thus, as in Chapter V, if we write 

X 

Erf(.r) -- f e ~ x * dx , 

V7 T J 
0 

then Erf(oo) — 1, and Erf(O) = 0. 

When x is small we may approximate to the value of Erf (a*) as 
follows. 

We have 


j e~*'dx = j + d * 


X 3 t X h X 1 

— X— '3+072! 773! 


Since the series is an alternating one, the sum to two successive 
terms gives an upper and lower limit to its sum. Thus, if we 
reject the terms beyond x 7 , the result will be deficient by an 
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x • 

amount less than- ; . If this is to be less than unity in the 

9.4! J 

fourth decimal place, we require 

-—r < 10~ 4 , or x < 2.10~ 2 , approximately. 

9.4! 

For large values of x we proceed differently. Integrating by 
parts, we have 


00 

j e~ x ' dx 

X 


oo oo 

f - xe~ x ' dx = — e~ x * — - f -i e~ x} dx 
J x 2x 2 J x 2 


1 . 1.3 f 1 , j 

= 2 i e - S3' +irj 


Continuing this process, we obtain 


r e~ x% 1 

f e~ x 'dx = — 1-- 

J 2a; 2x 


+ 


1.3 1.3.5 


2x 2 ‘ (2a; 2 ) 2 (2a; 2 ) 3 ^ 

Since the function e~ x 8 is decreasing in the range (a;, oo) it is clear 
that the error involved in stopping at the fourth term is less 

oo 

f 1 3 5 7 135 

than e~ x 1 o4 V ~~ numer i ca Jly> i*e. less than , 

•C 

the last term retained. A similar result is obtained at any stage 
of the expansion. 


The Probable Error 

We define the probable error for a Gaussian distribution in 
analogy with that of a finite set of observations by stating that 
it is a deviation the probability of whose occurrence is Thus, 
if r is the probable error, then 

*• 

— r 

As in previous examples, we express this integral in terms of 
Erf x. Writing h — 1/<tV 2, we require the value of r which 
makes r/trv 2 

11 "*-*■ 

0 

where hx = z. Thus Erf(r/<7\/2) = 0*5. 
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From the table we find that 

r/aV: 2 = 0-477, 

and t = 0-6745cr. 


As we have seen, the probable error gives the upper and lower 
limits for the deviation of a variable such that the probability 
of the deviation lying within those limits is equal to the proba¬ 
bility for which it lies outside. Or we may say: the odds in 
favour of the deviation lying within the range are 1:1. We 
may inquire what are the odds in favour of the deviation lying 
in the ranges ±2r, ±3r.... 

Thus, the probability that the deviation will lie in the 
range ±2r is 

2r/aV2 


f e~ z ' dz , or Erf(2r/aV2). 

\7T J * 


0 


Since r/aV2 = 0-477, the probability is Erf(0-954) = 0-83, from 
the tables. 

Hence the odds in favour of this range are 


0-83 : 1-0-83 = 9:2, approximately. 

Ex. 1. Show that the approximate odds in favour of a 
deviation lying in the range 

±3 r are 21:1; 

±4 r are 142 :1; 

±5r are 1,310:1; 

±6r are 19,200 :1; 

±7r are 420,000 :1; 

±8r are 17xl0 6 :l; 

±9r are 10 8 :1. 


Ex. 2. What is the probability that a deviation will lie in the 
range 


a 


The required probability is —J exp(— x 2 /2a 2 ) dx = 


*■ o 

From the table, Erf (1/V2) = 0-682. 

Thus the odds in favour are 0-682:1-0-682 
approximately. 


Erf 


V2‘ 


17: 8, 
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Show that, for a range ±2a, the probability is 0-954, 

±3o, the probability is 0-997, 

±4<t, the probability is 0-99994. 

Applications of the Normal Law 

If the probabilities of the occurrence of x and y , two numbers 
in the range — oo < (x>y) < +oo are respectively 

-^exp (—h 2 x 2 ) and -^-exp(— k 2 y 2 ) f 

\7T V7 T 


x and y being chosen independently, we require the probability 
that f(x t y) lies in the range 

F <:f(x,y) < /X + S/X. 

The compound probability P is clearly 

-^-exp(— h 2 x 2 ) . -^-exp (~k 2 y 2 ) dxdy , 

\TT V7 T 

integrated over the range of x and y specified by the above 
inequality. 

Consider as an illustration the case 
f(x,y) = x+y , 

i.e. fi ^ x-\-y ^ 

oo fi±8p-~x 

Then P — — J exp (—h 2 x 2 ) dx j exp(— k 2 y 2 ) dy. 

— oo fl—X 

Now by the mean ordinate rule for integration we may write 

/Z + 8/* — x 

J exp(—&V) dy 

= £[exp{— k 2 (fi-\ z) 2 }-f exp{—P(/n—x) 2 }]8/x 

= cxp{— k 2 (n~x) 2 } 8/x. 


00 

Hence P — — J exp {—h 2 x 2 —k 2 (fi—x) 2 } dxSy,. 
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Hence 
„_ hkSp 


(“SqV*) 


ir° x *\-¥+& 

hk Sfi 

V(A 2 +P) Tr 


( h2k2 «\ 

exp ( — ) 


or 


where 


P = 8/z exp(—Z 2 ^ 2 ), 

Z 2 A^it 2 


Following precisely the same line of development it is easily 
verified that if f(x y y) — ax+by then the probability that 
ax-\-by is chosen in the range {p, p-{-8p) is 

P = -^-exp( — L 2 p 2 ) 8p, 


where 


L 2 “ A 2 + A; 2 * 


Once again this is easily generalized to the following proposi¬ 
tion: 

If x v x 2i ..., x n be a set of n independently chosen numbers in the 
range (oo, — oo), and if the probabilities with which x v x 2 , x 3) ... are 
chosen are 


^■exp(— h\x\), ^exp(— h\x\), 

then the probability that 

a 1 x l +a 2 x 2 +...+a n x n 
shall lie in the range (p, p-\-8p) is 


^exp(—A 2 * 2 ), 


Z8/LA 

\4r 


exp (—Z 2 /x 2 ), 


where 


Z a h$ hl~^ ' ' 


M 

n 


Accuracy of the Arithmetic Mean 

Let a x = a 2 = ... = a n = 1/n, 

then the c probability of (£ 1 +# 2 +---+# n )/ n lying in the range 
(n,H+8lx)is l8ji 

^exp (-ZV), 

4260 xr 
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" here p-i[jq+« + - + «} 


If all the quantities h r have equal values, then 

- — — -JL or l — h*Jn . 

l 2 h 2 n 2 nh 2 

Accordingly the required probability is 


vW 


exp(— nh 2 fi 2 ) 8 jjl. 


The fact that all the quantities h r are equal implies that all 
the measures x n are equally precise, i.e. they each belong 
to groups having the same standard deviation 

_ 1 

AV 2’ 

Now from the composite law of error of the arithmetic mean, 
P = ^exp (—ZV), 

and therefore the standard deviation for the arithmetic mean is 




J_ __ 1 <7 

ZV2 AV(2n) ~~ Vn* 


Thus the accuracy of the arithmetic mean of n observations is \!n 
times that of a single observation of the system , if all are equally 
good and if the deviations of the observations and of the means 
satisfy the Gaussian law. 

Ex. Consider the probability that (x 2 +y 2 )* lies between fi and /z-f 8/x 
when x and y are selected in the range ( — go, go) according to the Gaussian 
law, with equal precision constants. Here 

P = JJ ~ exp{— h 2 (x 2 +y 2 )} dxdy , 

the integral extending over the region defined by 
fi+SfjL > (z 2 +y 2 )* > fM* 

ThU8 2n fi+dj* 

p = — j dO j rexp( — h 2 r*) dr = [—exp( —ft t r 2 )]£ +a#A 

= exp(—exp{--^ 8 (/i-f 8ft)*} = 2h 2 n 8/x exp(— h 2 y}). 
Hence, also, the probability of *J(x 2 +y 2 ) lying between /z x and /z a is 
•f 

J exp(—h 2 fi, 2 )2h 2 /j, dfi = exp(— h 2 fil)— exp(— h 2 y%). 
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The Random Walk 

On p. 81 we dealt with the problem of the Random Walk in 
two dimensions where the length of each walk was specified 
but the direction undetermined, all directions having an equal 
probability. We turn now to an examination of the comple¬ 
mentary problem, in which the directions are specified but the 
distances traversed in each direction are undetermined except 
that they are each drawn, as it were, from stocks distributed 
about the mean, according to the Gaussian law. We consider, 
therefore, the simple case of two component translations x and y 
at right angles. 

An individual walks a distance x from a point O, then turning 
at right angles walks a distance y. If the probability that x lies 

j ^ 

between x and x-f8x is -- r -exp(—A 2 x 2 ) 8x and that y lies 
between y and y-\-hy is -^-exp(— k 2 y 2 )8y, it is required to 

V7T 

determine the probability that the individual is finally to be 
found at a distance between /x and /x+8/x from 0. 

Since x and y are not selected with equal precision, but according to 
the laws (h/\ ! 7r)exp( — h 2 x 2 ) and (kj \'tt)oxp( — k 2 y 2 ), then 


n V{(/i + 8/i) 5 -sr 2 } 

P — ^ J exp( — h 2 x 2 ) dx X J exp( — k 2 y 2 ) dy. 

-H 

The limits of integration are determined from the fact that 

/* < VteHy 2 ) < 

i.e. ViM 2 -* 3 ) < y < VftM+fyO 2 -* 2 ]- 

Now VC^+V) 2 -* 2 ] = V0* , -* 2 +%*%» +V a > 


= vV 2 -* 2 ) + 


fiSfi 


V(ft 2 — x 2 )* 

on retaining terms of the first order in 8/x. 

Thus the integral on the right becomes, since the two limits are close 
together, 

[exp{ - &»((/*+S/x ) 1 -x a )} + exp{ - Jc'iy'-x 1 )}] 

= vc 5 ^ i ) exp{ "* v_ * ,)} - 
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Accordingly, M 

_ fxhk~ C exp { — 1c 2 (ijl*—x 2 )} 

p= VH 


Let x = fisinO; then 
in 


p — *L J exp{—(h 2 am 2 d+k 2 cos 2 6)fi 2 }dd 

-in 

phk ( w+k 2 r n i *h 2 -k 2 

— exp y - - — fi 2 J Sfi exp yfi 2 —-— cos 2d J dd. 


Now if A 2 = in 2 (k 2 -h 2 ) t then 

in 


J exp(—Acos20)d# — ^ J exp(—Acos <f>)d(f>, where (j> = 2d, 

-IT 

n 

— | J (l— Acos^+^jCos a ^ — 


— TT 

in 


= 2 J (l + |- ! c os^+| 1 cosV + ...) d<j> 
0 

r.A* L A* , 

W l 1 + 2 2 + ( 2 !)* 2 * + 


.JL*+J 

(3!) 2 2 5 ~ J 
= irJ„(\i) = Tr-7„[ M V(^-fc 2 )/V2]. 

Thus, finally, since /x is regarded as positive, 

P - 2pMexp{-J(/ l 2 +A; 2 ) / xW^ 

We note that if h = k, then A = 0, and 

P = 2fih 2 exp(— fx 2 h 2 ) 8/x. 

This problem finds an interesting application in the determination 
of persistent periodicities in observations. (See J. Bartels, Terrestrial 
Magnetism , etc., vol. 40, no. 1, 1935.) 

Ex. 1. Particles are distributed in a plane X,Y in such a 
manner that their x- and ^-coordinates belong to Gaussian sets 
of standard deviation a. 

Show that the probability that the distance from 0 of any 
one of them lies (a) between 0 and a is (Ve—1)/Ve, (6) between 
a a and j9a is e~0 1/2 —e- a * /2 , where a < jS. 

Ex. 2. If in the foregoing example the ‘probable distance R 9 
from O be defined as that for which it is equally probable that 
the particle will lie within it as without, show that 

B 2 = a 2 log e 4. 
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Show further that the region of greatest density of particles is 
in the neighbourhood of r = o. 

The Gaussian Law and Experiment 

At this stage it is worth while reviewing again the position of 
the Gaussian law with regard to experimental observation. 
The law has been derived by us on assumptions which cannot 
be held to apply rigorously in practice (p. 120); moreover, like 
the Bernoulli law, the Gaussian law indicates what frequency 
curve will be found on these assumptions when all possible 
arrangements of the elements considered have been included. 
Now it is always possible to assume that any frequency curve 
obtained in practice represents a sample of a super-population; 
it can be regarded as a selected and not an exhaustive collection 
of the possible arrangements. This has to be borne in mind if 
we are not to apply the Gaussian law uncritically. But there 
is another and, in a sense, more fundamental objection: it may 
not be true—and there is no reason to suppose it even approxi¬ 
mately true—that all the arrangements of data which might be 
chosen from the population necessarily show that the hypo¬ 
thetical population conforms to a Gaussian distribution. 

When a set of data does not so conform, one is tempted to 
assert that this circumstance arises from the fact that the data 
represent only a sample; but it may be that the original popula¬ 
tion is not Gaussian. The position is clearly seen from the 
investigation on p. 156; it is there shown that if a population has 
its frequency expressible as a function v(t) of a variable t 
representing some characteristic, and if in sampling the popula¬ 
tion at what is presumed to be a value t , we draw in sets of 
data in the neighbourhood of t , with a probability of choice 
p(x) at t+x, then the final sampling distribution is given by 

u(t) = j v(t-\-x)p(x) dx y 

the integral extending over the range of the sampling. It is 
clear that the form of a sample depends on the conjunction of 
the distribution in the population and the law of choice over the 
range specified. As we shall prove, when the population and 
the law of choice are Gaussian, so is the sample; but if either the 
population or the law of choice be not Gaussian, the sample 
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is not Gaussian. It follows that to apply conclusions drawn 
from a Gaussian distribution to the interpretation of any group 
of samples may involve us in serious error. 

Here again we must not attempt to escape from this impasse 
by asserting that, in the last resort, the Gaussian law gives an 
idealized distribution by which to interpret any given set of 
data. There is no escaping the plain issue that every such 
interpretation must stand side by side with the assumption that 
the original population is Gaussian. 

The Significance of Deviations 

In connexion with the above remarks we may consider 
generally the problem of significance as it arises in statistical 
theory. Broadly speaking, we may say that the significance 
of a statistical constant is usually estimated by comparing it 
with the corresponding constant which would be found under 
so-called ‘conditions of randomness’; that is, by calculating 
the probability that a constant of this magnitude would be 
found under conditions in which all possible arrangements could 
occur. Thus, let us suppose that certain data are presumed 
to be measurements carried out under the same physical condi¬ 
tions on the same object, and that the deviations from the 
average have been found and the standard deviation or calcu¬ 
lated for these observations. So far we have made no assump¬ 
tion regarding the nature of any distribution law to which the 
measurements are presumed to conform. Now suppose that one 
of them in particular appears to differ very considerably from 
the others, showing a deviation 4or, say—the deviation having 
been found by including this observation. A good experimenter 
may justifiably have his suspicions aroused as to the accuracy 
of the observation: how is he to decide whether it should be 
included or not? If he is a sensible experimenter he will know 
whether any suspicion arising in the course of his work attaches 
itself to this particular observation—no statistician could 
possibly tell him that—and in so far as he relegates his judge¬ 
ment on this matter to the statistician he is surrendering his 
function as an experimenter. All that the statistician can tell 
him is how far the set of measurements are consistent with 
some assumed law of distribution, for there is no meaning in 
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the bald statement, ‘the numbers are consistent among them¬ 
selves’. Thus, what the statistician does is to seek the proba¬ 
bility that a deviation from the average as large as 4 a will be 
found from the same number of data drawn ‘at random’ from an 
original population, the structure of which he proceeds to specify. 

The experimenter has no such knowledge of this structure: 
one of the purposes of his experiment is to find it. As we have 
seen, the odds against a deviation of 4<r, on the Gaussian law, 
are about 10 5 : 6 or 10 4 : 1, approximately. And if the experi¬ 
menter is overwhelmed by this fact he accepts without further 
question the significance of the odds. Thus, the significance of 
the observation is referred by this process to the significance of 
a probability arising from an assumed population and, accord¬ 
ingly, the experimenter may decide to reject this observation. 
The statistician does not in fact dp precisely this; he states 
that when the odds are, say, 25: 1 against, he will advise 
rejection. The justification of this judgement is stated to be 
based on experience; but if it is, it can only be the experience 
of the experimenter reinterpreted by the statistician. 

3. Other forms of hypothetical populations 

In general we may say that when from a set of data, restricted 
in extent, a frequency or probability curve is constructed and 
its equation expressed by a mathematical formula, we have 
thereby invented a hypothetical population of which our data 
may be regarded as samples. There is clearly a considerable 
latitude in specifying this formula; the mathematician knows 
that through a finite number of points will pass an infinity of 
curves, so that other conditions describing the nature of the 
formula to be used must be given before we can assert that the 
final result represents the hypothetical population which satisfies 
our requirements. This problem of constructing the hypo¬ 
thetical population is simply a restatement of the above- 
mentioned problem of determining the original population when 
the data and the method by which they were selected are 
known; if, of course, no method of selection is specified, then 
all sorts of formulae can be found. A given type of formula for 
the population implies some kind of selective process, even if 
it isi not explicitly stated. 
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In the notation of the previous section, the sample u(t) of a 
population v(t) defined in the range (— 00 , 00 ) is given by the 
equation « 

u(t) = j v{t+x)p(x) dx . (1) 

— 00 

Now assume that v(t) can be expanded in a Taylor series 

v{t+x) = »(0+*»'(0+|j«'(0+--- • ( 2 ) 

If we introduce the constantsf m v m r defined by the 

relations 



—a? 


we may write (1) in the form 

u{t) = v{t)+m 1 v'(t)+m 2 v"(t)+... . (4) 

That is, the sample can be expressed in terms of the proba¬ 
bility function for the hypothetical population, its derivatives, 
and the moment coefficients of the probability function of 
selection. 

We observe that if the function p(x) is a symmetrical (i.e. an 
even) function, then the coefficient m r is zero for all odd values 
of r. In this case the sample u(t) will be expressible in terms of 
v(t) and its even derivatives. 

Ex. If p(x) = e""* 1 *', show that 

V 7T 

1 „ 1 

m * r ~ 4 rh *™*- 2 ~ (2/^Fr 

It is a simple matter to invert equation (4), supposing that 
the operation of inversion is permissible. For we have, by suc¬ 
cessive approximation, 
v(t) = u(t)— 

or v(0 = u(t)—m 1 u\t)—m 2 {u\t)—m l u"\t )}+.... (5) 

t These are numerical multiples of the ‘moments* of p(x) as usually 
defined. 
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This formula expresses v(t) in terms of the simple function 
u(t) and its successive derivatives. 

h 

Ex. Show that, if p(x) = —r-e~ h * x \ 

V7r 


V(t) = U(t) - £,*'(0 


f 32A 4mIV(<) + '" 


If the hypothetical population is itself Gaussian, i.e. if v(t) is 
Jl 

of the form then irrespective of the method of selection, 

it follows from the foregoing that we should be able to expand 
a. given sample function in a series of terms consisting of 
numerical multiples of e~ h%xt and its successive derivatives. We 
may invert this process; in fact formula (1) shows that if a 
sample of a continuous variable is assumed to be Gaussian, then 
the hypothetical population can be’ expressed as a series of 
linear combinations of e~ htx% and its derivatives. In both cases 
the coefficients in the expansion are definite numerical multiples 
of the moment coefficients of the probability distribution used 
in the process of selecting the sample from the hypothetical 
population. It remains, therefore, to examine the procedure 
to be followed in order to expand a given function in the 
manner described. 


The Hermite Polynomials 

Consider the function y = e-* x * (in which, for simplicity, we 
have written h — 1/V2 and omitted the factor ll^(2n)). The 
first derivative of y with regard to x is 


dy = 
dx 


—xe~* x \ 


The second derivative is 


d 2 y __ 
dx 2 


dx 


Ore-**') = (x 2 -l)e-** B ; 


and in general the nth derivative is 

, »(» —l)(n—2)(»—3) „_ 4 _ 

* O A 






(6) 


2.4 
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The expression 

x n | 2 )(”~ 3 ) J .n-4 _ 

2 2.4 

which occurs in (6) is called the Hermite polynomial of order 
n, and is denoted by H n (x). It is easily shown that H n (x) 
satisfies the differential equation 


d*H n (x) dH n (x) H . , _ 
dx* X dx +nMn[X) ’ 

and the recurrence relation 

H n+1 {x)-xH n (x)+nH n ^(x) = 0. 
For since y — e~ ix \ 


(7) 

( 8 ) 


* = -xe-f - - 


dx 


xy. 


Differentiating this result n times, we have 

d ^y +x ^y +n d L^y. = 0 , 

dx n+1 dx n dnr n ~l 


dx n 


which, since = (— l) n H n (x)y, is equivalent to (8). 
We have also 

d^ d^y d^y = 

dx n +*^ dx n +'^ K ^ ’dx n 
d 2 d 

^{H n (x)y}+x^{H n (x)y}+(n+l)H n (x)y - 0. 


or 
Hence 

d 2 H n (x) 2dH n (x)dy „ . .d^y dH n (x) 

+xHJx) d £ + {n+\)H„(x)y = 0. 

If we insert the values of dyjdx and d*yjdx 2 and divide by y, this 
becomes 

+ (x'-l)ff.(x, + x^W- 
dx 2 dx dx 


-x*H n {x)+(n+ l)H n (x) = 0, 


which reduces to (7). 
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By means of (8) we can compute H n (x) for successive values 
of n , since H 0 (x) and H x (x) are both known. 

It follows from the expression for d n y/dx n that the curves 

are all symmetrical, while the curves 

d 2r + l 


y = 

are skew (see Figs. 23, 24). 


dx 2r+l 


(e-n 



Fig. 23 


It is clear that y = xer* x% can represent a probability curve, since 

CO 

J x e~ ix * dx = [ — e - ***]® = 1. 
o 

The mode or maximum value of y occurs when x — 1 and is thus e - *. 



Fig. 24 


The importance of the Hermite polynomials, from our point 
of view, lies in the fact that any given frequency function f(x) 
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which satisfies certain very general conditions! may be ex¬ 
panded in a series of the form 

f(x) = a 0 e^ xt +a 1 e^ xt H 1 (x)+a 2 e^ xt H 2 (x)+ (9) 

where a 0 , a v ... are constants. 

To obtain the coefficient a ni multiply both sides of this 
identity by H n (x). Integrating, we have 


f f(x)H n (x) dx = fa i f e-^H^HJx) dx. (10) 

J i = 0 J 


Now 


J e~l x 'H m (x)H n (x) dx = [(-1 )"tf OT (*) £ (e~nV - 

— CO L J _ 00 

f fin-1 

- j { -i)nH^x)j^zi(e-**')dx, 

— CO 

on integration by parts, 

= J e-**'H' m (x)H n _ x {x) dx, 

— co 

in virtue of the fact that the integrated part vanishes at both 
limits. Proceeding thus, we obtain, if n > m, 

J e-*'H m {x)H n (x) dx = j e~**H™(x)H n _Jx) dx. (11) 

— C30 — CO 

Now, if n = m, we have 

H%\x) = n\ and H 0 { x) = 1. 

00 

Thus the integral reduces ton! J e-** 1 dx = n!< N /(27r). 

— 00 

If, instead, n > m, integration of (11) gives us 
J e~**‘HJx)H n (x)dx= \ e-*'H%+'\x)H n _ m _ x [x)dx. (12) 

— CO —CO 

But since H m (x) is a polynomial of degree m, H { ™ +l) (x) is zero; 
thus the left-hand side of (12) is also zero. 

Returning now to (10), we see that if i ^ n, the coefficient 


f It is sufficient that f(x), f'(x), and f*(x) should be finite and continuous in 
(—oo, oo) and that f(x) and its derivatives should vanish at x = ± oo. 
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of at vanishes, by (12), while if i = n, it is equal to n\J(27r). 
Hence we obtain from (10), 

00 

J f(x)H n (x) dx = a n n\^j(2n), 

— 00 

00 

giving J f{x)Hn{x) dX - 

— 00 

It will be observed that the method of determining a n follows 
closely that of obtaining the coefficients in a Fourier expansion. 

Thus, when the sample is expressed as a series of derivatives 
of e~ ix \ the hypothetical population will itself be expressible 
in this form. The cases we have dealt with above are the com¬ 
paratively simple ones in which one function or the other is 
Gaussian. 


Standard Deviation for Bernoullian Populations 
In this case the standard deviation a from the average is 
given by (p. 62) 

a 2 — 2 n C r p r q n ~ r (np—r) 2 

r—0 

= n 2 p 2 2 n C r p r q n ~ r —2np 2 n C r p r q n ~ r r-\- 2 n C r p r q n ~ r r 2 . 

( 1 ) 

From the identity 

(p+q) n = 2 n c r p r q n ’ r , 

we find as on p. 63 that 

np(p+q) n ~ l = 2 n C r p r q n ~ T r, ( 2 ) 

giving np —' 2 n C r p r q n ~ T r. 

Differentiating (2) with respect to p, 

n(p+q) n - l +n(n--l)p(p+q) n - 2 = 2 n C r p r - l q n ~ r r 2 . (3) 

Hence, substituting from (2) and (3) in (1) we obtain 

a 2 = n 2 p 2 — 2n 2 p 2 +np+n(n— l)p 2 — np(l—p) = npq . (4) 

If we use this value of a to specify a Gaussian population, 



then 


aV2 <j{2np(l—p)} 


(p. 72). 
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Bernoulli’s Limit Theorem 

We have seen that the mean value of the deviation \r—np\ is 
equal to <J(npq)\ since this expression tends to infinity with n, 
it follows that when the number of trials is increased indefinitely , 
the probability of obtaining a deviation which is less than any 
assigned number tends to zero . 

r 

At the same time we observe that the mean value of- p is 

n 

equal to /—, and that this expression tends to zero as n tends 

V to 

to infinity. We thus obtain the following fundamental result: 

Theorem. When the number n of trials is increased indefinitely , 
r 

the probability that - p will remain less than any assigned 

n 

number approaches unity . 

This theorem is due to Bernoulli, but it should be noted that 
the information it provides falls far short of what we should have 
liked to obtain. All we can infer is that the probability of obtain- 

r 

ing at most a given deviation- p is less than any given 

n 

small number, provided that n is sufficiently large—an assertion 
which differs essentially from the ‘first empirical assumption’ 
quoted on p. 29, from which the conception of probability has 
been removed. 


Poisson Distributions 

Bernoulli’s formula states that the probability of exactly 
r successes in n trials is 


P = n C r p r (l-p) n ~ r , 

where p is the probability of an individual event. In this 
formula write p = ejn, so that € = np is, as we have seen, 
approximately the most probable number of successes. 

Then 
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l n(tt—I)...(n—r+1) 


-5Mr 


n r 


vw 


-SO-sJX'-iX'-D ('-VI/l'-J- 

We shall now suppose that the events under consideration 
are rare, that is, p is small compared with unity. Hence, in 
order that the most probable number of successes may be 
appreciable, n must be large, since p = e/n. In these circum¬ 
stances ^1 —-j = e~ € , approximately. 

Now the product |l—^j...|l —-—-j lies between 


unity and 1 - 


r(r- 1) 


2n 


, and thus tends to unity if r{r— 1) is small 


compared with 2n, which will be the case if r 2 /2n is small, since 
rjn is still smaller. 

Also — — e ~ €r,n == approximately, if nis 

large compared with r. 

It follows that, provided r 2 /2n is small compared with unity 
and p is small, the value of P is approximately V/r!. In other 
words, if we select from a large population in which the proba¬ 
bility p of success is small, then the probability of r successes in n 
trials is given by p = e -v/r! = e- n P(np) r /rl 

provided that r 2 /2n in small. 

This result is known as Poisson’s law of distribution, appli¬ 
cable to the case of rare events. 

Standard Deviation for Poisson's Law 
The Poisson Law does not represent a true probability dis¬ 
tribution, since the sum 


\ ' l\'2i 'n\j 


V\ 


n r 

is not equal to unity. If, however, n is large, ~ is approxi- 

- rwm ° 

JSL* € r 

mately equal to N —, by which it may be replaced. 

rO ^ 
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To this degree of approximation the average value a of r is 


r —0 r=0 v 

Thus the approximate value of the standard deviation a is 
given by 

r=0 


or 


a 2 = 



2 


(r—1)! 



re r 

(r~m 


— c~ € {e 2 e € —2e 2 e € +(c+€ 2 )e 6 } 


Hence a = \e. 


Ex. 1. 
than 


Show that the error involved in writing ~ for ^ ~j i* s l es8 

r-o r=o 


e n p n+1 ( 1 —p)" 1 
'V{2tt(»+1)} * 


Ex. 2. T/te Telephone Problem . The telephone service in operation 
presents an enormous number of practical problems in probability. 
These are, however, necessarily so technical that a simple case only is 
given in illustration. Suppose that there are n available lines and that, 
on the average, € of these are in operation at any given moment. Using 
Poisson’s law we find that the probability that at any time exactly r 
lines are in request is 

e-V/r!. 


Now, if the average time of duration of a call is T, the probability 
that a call on any particular line will begin in a time dt of this interval 
is dt/T. Hence, the probability that a call will begin in an interval dt 
on any of the e lines which are, on the average, in operation is edt/T. 

It follows that the probability that in the interval dt exactly r lines 
are in use and an additional line is required is 

e“ € € r € dt 

rV 'IF' 


Clearly, if r > n, the additional line will not be available and the call 
will be lost. Thus the probability of a call being lost on this occasion is 



n 
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Since the probability of a call arriving in the interval dt is edi/T , we 
conclude that the probable proportion of lost calls is 


00 

2S- 


Ex. 3. From a given populat ion of N numbers x lt :r 2 ,..., a sample 
of magnitude n is selected. Denoting its mean by M rf show that the 
mean of all the N C n such means m r is equal to the mean M of the original 
population. 

Ex. 4. Prove that the standard deviation a n of the m/s is given by 
= I ( M-m r )TC n 

_ ( N-n ) y 2 jN-n ) V * 

~ nN* Z, ‘ nN a (N -1) Z, i j ' 

Ex. 5. Deduce that the standard deviation cr of the #’s is given by 
W LV— 1) 

a 2 — —--cr 2 , so that, for large values of A T , is approximately 

equal tod/vn. (Cf. the result on p. 130, on an entirely different assump¬ 
tion.) 

Ex. 6. Given n readings ;r t , :r 2 ,..., a’ n with mean tn, we call the quanti¬ 
ties v i — m—Xi the respective residuals. Supposing that M is the true 
value of m for all possible readings, we call the quantities e* — M the 


n— 1 


€ t - 

n 


i » 
n 


corresponding errors. Establish the formula v l 

Assuming now that the c\s are normally distributed, with precision 
constant /<, deduce from the result of p. 129 that the precision constant 

1 n — 1 

h' for the v’s is given by » and hence that the standard 

deviation for the c’s is yj{^ rf/(n — 1)}. 


4260 


L 



CHAPTER IX 


THE USE OF PROBABILITY IN SCIENTIFIC 
INDUCTION 

1. The general problem 

All scientific conclusions are arrived at by a combination of 
inductive and deductive processes. The experimenter provides 
the data, the mathematician accepts them and offers a hypo¬ 
thesis which links them together, and then by mathematically 
deductive reasoning draws certain conclusions from them. From 
the point of view of mathematical technique a deduction has 
been made; from that of scientific method, in stating a hypo¬ 
thesis which outruns the experimental data alone, an induction 
is involved. The mathematician has deduced certain conse¬ 
quences, and, offering them to the experimenter as possible 
truths, demands their physical verification or disproof. The 
experimenter deduces by his particular method that they are 
true in his particular circumstances; and together they pass to 
the inductive stage that the hypothesis outstripping even these 
new facts is still true in the sense that it is a valid guide to the 
next step. 

Thus we discover three elements in any scientific problem: 

(1) A set of data, given as the result of experiment: we refer 
to these as the ‘sample*. 

(2) A wider field (‘the population*) of possible data from 
which (1) has been selected.*)* 

(3) A hypothesis or hypothetical law tentatively presumed 
to govern the structure of (2). 

Stated in this way, the problem appears in a form detached 
from the experimental methods which are necessary to collect 
the data and from the use to which (3) is to be put. For 
example, on account of the imperfections of their apparatus 
the experimenters may incorporate in a reading at time t , 
say, readings over a time t±t'. Or the data may be such as 
to require classification as of length l when in fact the actual 

t In this connexion see the limitations of this principle in many cases of 
physioal science (Ch. VIII). 
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lengths vary about l . It follows that there is apparently a 
fourth factor in the situation which requires to be considered 
if (1), (2), and (3) are to appear as steps in the scientific pro¬ 
cess, namely, 

(4) The process of selecting or ‘sampling’ the data. 

The way in which these four elements are associated can 
be shown in mathematical form. Let us imagine an original 
population to consist entirely of elements having a common 
characteristic measured by the variable t\ and suppose that 
this characteristic occurs at values t v with frequencies 
F(<i), F(g,..., where for the moment t v g... are integers. The 
total size of the population is thus 

V(t l )+V(t t )+...+V{t n ). 

In a problem of scientific induction we do not know the form 
of V(t)\ we can only speculate on it by means of (1) and (3). 
Suppose, however, that a set of data TJ{t) has been collected, 
covering the whole range of t: thus, U(t r ) is the frequency with 
which the data are collected at what the experimenter believes 
to bo the value t r . We have used the word ‘believes’ designedly 
because what the experimenter does in practice is to sweep into 
his reading at, say, t x a number of readings at £ 1= bl, ^±2, etc.; 
this inclusion of false data is not within his control, for he acts 
on the assumption that he is obtaining correct data at the given 
value of t. 

We shall suppose that the false data are swept into the true 
readings according to some particular law; thus, let the un¬ 
known law which describes the proportions of readings at 
neighbouring positions included in the reading at t be p(s), 
where 8 is the interval between the readings at t and 
Since F(J+s) is the number of readings which occur at t+s, 
the number of these which are accepted as being at t is 
F(<+s)jp(s). 

It follows that the frequency U (/) of the samples found at t 
is the sum of all terms of the type F(£+s)p(s), where s takes all 
possible valfies about the position t. It is clear that a good 
experimenter will have so designed his experiment that very 
few, and small, values of s occur; mathematically this implies 
simply that p(s) is always zero beyond a particular range of s. 
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With this understanding we can write 

U{t)= I F(<+«)2>(«). (1) 

- oo 

Ex. 1. Suppose that the sample is obtained by including with half 
the values which actually occur at t> a quarter of those which occur on 
both sides. Then the function p(a) is defined by the properties 

P(0) = b P(l) = P(-l) = b and p(a) = 0 
for all other values of s. It follows from (1) that 

U(t) = F(^-l)p(-l) + F(Op(0)+F(«+l)p(l), 
i.e. 2U(t) = V(t) + l[V(t-l)+V(t+1)]. 

Thus, if, for example, V(t) = 2(10 —£), then 

2 £7(2) = 202 —22 2 — 1. 

Ex. 2. If V(t) is given by the table 


t 

I 0 1 1 I 2 

3 

4 

5 

6 

V(t) 

W 

11 

14 

11 

7 


calculate the nature of the sample £7(2) for values of 2 from 2 = 1 to 
2 = 6, using the method of selection in Ex. 1. 

The above examples illustrate the simple problem of deducing 
the sample when the structure of the original population and the 
mode of selection are specified. 

We are now in a position to restate our previous remarks in 
symbolical form. We are confronted with the following pro¬ 
blem: If a sample distribution U(t) has been found and a 
method of selection p(s) postulated, what can be deduced about 
the original population F(<)? 

If the operator E is defined by the relation 

-®/(0 = /(£+!)> 
so that E*f(t) — f(t+$)y 

then (1) becomes 

U(t) = i F(<+s)p(«) = 2 E°V(t)p{8) 

— 00 — 00 

If the infinite series within the brackets has the formal sum 
then we obtainf 

U(t) = <f>(E)V(t), 


t Cf. Chapter II, p. 29. 
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whence, by the method of operators, the particular solution 
of this equation is = ^-1 (E)U(t). (2) 

Now Ef(t) = /(*+l)-/(0+/(0 = A/(0+/(0, 

= (A+l )/(<)» 

in the notation of differences. 

Suppose that = ^(l+A) can be expanded in ascend¬ 

ing powers of A, in the form 

Then the function V(t) which represents the original population 
is expressed in terms of the sample U(t) and its differences. 
We note that if U(t) can be represented as a polynomial in t , 
then all its differences beyond a certain power are zero and 
V(t) is expressed in finite terms. 

Ex. 3. Consider the equation given in Ex. 1 above. Vve have 


Thus 


2U(t) = {UE-' + E) + l)V(t)*-- 
4 E 


E 2 + 2E+\ 
2E “ 


m 




= (l+A)(l+iA)-*(7(<) 

= (1 + A)(1 —A + f A 2 ...)t/(<) 

= (1 —iA 2 ...)C7(<). 

If, for example, U(t) = 9—< 2 in the range ( — 3, 3), then 

V(t) = 9-< 2 + i. 

Ex. 4. Suppose that p(s) — e~ a (s > 0) and that p(s) = 0 (s < 0). 


00 

Then C7(<) = V (f )V(«) = 

0 


-E 


V(t). 


Hence 


V(t) = U(t) = V(t)-\u(t+\). 


We return now to consider the general solution of equation 
(I). This consists of the particular solution (2) and a ‘com¬ 
plementary function’, the solution of 

<f>(E)V(t) = 0. 

This function is to some extent arbitrary in character, as is 
seen by the following examples. 

Ex. 1. 'Suppose that U(t) = 15— t 2 , and that the sample is 
obtained from the population V(t) by the law 
U(t) = l[V(t+l)+V(t-l)l 
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where the range of values of t required for the evaluation of 
17(0 is given by _2 < t < 2. 

We have to solve the equation 

(E 2 -\-l)V(t—l) = 2(15-* 2 ). 

We thus obtain for the general solution 

V(t) = 16—J 2 +.4cos|7r£-|-J3sin|7r£, 
where A and B are arbitrary constants or functions of period 
unity. 

Now V(t) must remain positive over the whole range of t 
required, namely — 3 < t < 3. We shall see that this condi¬ 
tion may be secured by taking A = 0; for in that case, V(t) will 
remain positive in the required range, provided that B satisfies 
the condition — 7<B<7 

Hence there is an infinity of solutions to our equation satisfying 
the given conditions for U(t). 

Ex. 2. That a hypothetical population cannot always be 
found may be seen from the following example. Suppose instead 
that U(t) = 16— t 2 , and that the law of selection is the same 
as before. Since U(t) must be positive, we require — 4 < t < 4. 
The general solution of the equation for V(t) is found to be 
V(t) — 17—^ 2 -f^lcos^+JSsin|7ri. 

With our law of selection V(t) must certainly be positive in 
the range — 5 ^ t ^ 5. But, substituting t == 5 and t = — 5 
in the solution, this necessitates B ><8 and B < — 8, which is 
impossible. It follows that, with the given law of selection , no 
population can be found to yield the given sample . 

Ex. 3. If 3C7(/) = V(t-l)+V(t)+V(t+l) f then 

3 <7(0 = 

The complementary function is evidently • 

V(t) — (1) 

where A and B are arbitrary and o> 2 are the roots of the equation 

E* -f- E -j-1 = 0, 

i.e. the complex cube roots of unity. 

If we write = cosfrr-HsinfTr, = cos §7r— iain^rr, (1) may be 
expressed in the form 

V(t) = A cos(£7t£ -f«)» 
where A and a are arbitrary constants. 
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The particular solution is given by 

o jp 


F( '> - wH+i vm - (i+ai'.+u+AR. 

so that V(l) = (1+A)(1 + A+iAa) -i £7 (<) 

= (1+A)(1 — A —$A 2 + A 2 ...) 17(0 
= (1-JA 2 ) £7(0 (2) 

if higher differences of U(t) may be neglected. 

To this order of approximation, the general solution of the equation is 

V(t) = Acoa(§7Tt+cc)+U(t)-iMU(t). (3) 


It is clear that the determination of the hypothetical popula¬ 
tion (3) above is equivalent to the process of graduating or 
‘smoothing’ the errors introduced by the selective process, as 
is explained in a later section. Our sample U(t) has been 
formed by taking the mean of three adjacent ordinates of the 
histogram V(t ), and our solution (3) represents analytically a 
reversal of this process. If we confine our attention to the 
particular solution, for which A = 0, we note that when U ( t) is 
a linear function of t , A 2 U(t) is zero, so that V(t) = U(t). When 
U(t) is a quadratic function of t, V(t) and U(t) differ only by a 
constant. 


Ex. 4. Find the original population V(t), giveft that each reading 
shown for U(t) is the true reading at t plus -j^th of the true reading 
at t- f-1. 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

13 

22-8 

31-2 

35-5 

38-6 

39-5 

38-2 

34-8 

30 

21*1 


We have to solve the equation 

U(t)= V(t)+&V(t+l). 
For the readings shown this gives the solution 
V(t) = 36 —(£—5) a . 


Bernoullian Law of Selection 

Consider first the case in which p(s) = 2 C r s +i£> s+1 (l —p) l ~ 8 , 
where p is a given positive fraction. The equation (1), p. 148, 
then becomes 

U(t) = 

We have thus applied a Bernoulli process of selection to the 
set of three consecutive ordinates of the histogram V{t) in 
order to obtain the sample U(t)\ and if i < p < §, the ordinate 
at t is swept into the readings with a greater probability than 
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either of the adjacent ordinates. If we put q = l—p, the 
equation may be written symbolically as 

U(t) = (q*IE+2qp+p*E)V(t) > 

or um = <r*+&rw. 

Thus the particular solution is given by 

= (1+A)[ 1 - 2^A+3p 2 A 2 -...+(- l)»- 1 np»- 1 A*- 1 ] U(t) 
— 1+ 2 (—l)"p" _1 [(M+l)p— n]A n U(t). 

n = 1 

Suppose generally that p(s) is defined by the formula 
P(*) = n C s+m p 8+m q n - m - s , 

where n and m are given numbers. There will now be w+1 
terms on the right-hand side of (1), which becomes 

n -m 

U(t)= 2 n C a+m p*+ m q n - m -°V(t+s), 

s= —m 

or, symbolically, U(t) = 

Hence the particular solution is 

V(t) = E m (l+pA)- n U(t] = (l+pA)- n U(t+m). 

The general solution is thus 

V{t) = (l+pA)-»^+m) + ^j'{^ 1 +^ 2 <+...+J n _ 1 <«-i} > 

where A v A 2 ,... are arbitrary constants or functions of period 
unity. 

Bayes’s Theorem 

Bayes’s theorem, which by its misapplication has attained 
a certain notoriety in the history of probability, follows at 
once from the foregoing discussion. In Fig. 25 the population 
F(0, from which the sample U(t) is drawn, may be regarded 
as contributing its quota to the sample at t in the proportions 
indicated. As we have seen, the total sample is 

U(t)= f F(<+«)P(«). 
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The contribution to this total at any position distant a from t 
is V(t+s)p(s). Thus, given a full knowledge of the population 
V(t) and the process p(s) of selection, we can say that a member 


of the sample at t has a probability - 

f,V(t + 8)p(s) 


that it has come 


from the position £+ 5 i in the V(t) diagram . 



Fig. 25 


This, in effect, is Bayes’s theorem. The frequency function 
U(t) enables us to specify the probability that a member of the 
sample will lie at t\ this is the initial probability conditioned 
only by the statement that the individual is a member of U(t). 
At this stage the theorem enters to tell us the probable source 
of this value of t when further information is available—the 
information being that the distribution U(t) has been derived 
from the source V(t) by a certain process p(s). 

Ex. 1. Three boxes contain balls as shown: 


Box 1 

Box 2 

Box 3 

1 black 

1 black 

1 black 

1 white 

3 yellow 

4 green 


It is known that a fourth box has dropped into it a ball from 
Box 1, two balls from Box 2, and one from Box 3. What is the 
probability that a ball in this box, known to be black, came 
from Box 2? 
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Without the information that it is black, the probability 


that it came from Box 2 is clearly- 

J 1 + 2+1 


1 

2 ’ 


We ask, what 


difference is made in this probability by the additional informa¬ 
tion that the ball is black. Actually, the additional fact con¬ 
verts the problem into a new one; and the comparison of the 
answers to the two problems, a step usually associated with 
Bayes’s theorem, has nothing to do with the question. 

In order to calculate the required probability we have to 
construct the functions V{t) and ^($), the variable t being the 
suffix attached to each of the three boxes. Thus, F(l), F(2), 
and F(3) are the numbers of balls in Box 4 which come from 
Boxes 1, 2, and 3, respectively. The probabilities of a black 
ball in the three cases are 


Pi 1 ) = h Pi 2 ) = h and p(3) = £, respectively. 

Hence 2 F(H-«)p(s) = M+M+M = i- 

The contribution V(2)p( 2) is 2. J. Thus the required proba¬ 
bility is 

The distinction between the two problems is now clear: the 
probability of a ball drawn from Box 4 having come from Box 2 
is + while if a ball is drawn from Box 4 and found to be black, 
the probability that it came from Box 2 is j 5 2 . 

Ex. 2. Given n x urns A x each containing v x white balls, n 2 
urns A 2 each containing v 2 white balls,... and n r urns A r each 
containing v r white balls: one of the urns is chosen and a ball 
extracted, which turns out to be white. What is the probability 
that it came from one of the n x urns A x ? 

We may suppose the balls placed together in one urn, pro¬ 
vided it is always possible to specify the urns from which they 
came: we do not thus alter the probability of extracting a white 
ball. The total number of white balls is w 1 v 1 +n 2 v 2 +...+n r v r , 
of which n x v x come from the urns A v If now a white ball is 
extracted, the probability that it is from one of the set A x is 

n i v iH n i v i+ n 2 v 2 +-+n r v r ). 

In applications of Bayes’s theorem, it must be understood 
that the structure of the original population is precisely de¬ 
limited: what we ask is whether, when a particular event among 
a series occurs, its source can be traced to this or that element 
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of the structure. When the problem is stated in this way, 
Bayes’s theorem gives a definite answer. If, however, we 
attempt to recast the problem by raising a query about the 
structure of the population, then we are faced with the solu¬ 
tion of an equation (p. 148) to which a unique answer cannot 
necessarily be given, since an element of arbitrariness is 
present. 

Ex. 3. An urn contains a black and white balls, in unknown 
proportions: a ball is extracted n times and each time replaced 
in the urn. If v of the balls extracted are white, what is the 
probability that a of the balls in the urn are white? 

The required probability is that of a subclass of the subclass 
of urns in the population of urns containing a black and white 
balls, which contain precisely a white balls. Thus we must 
imagine the urn in question to come from a population of urns, 
each of which contains a black and white balls, the population 
covering all possible compositions. In this population, the first 
subclass consists of urns containing no white balls, the second 
consists of urns containing one white ball, and so on. Then the 
probability that, if an urn containing i white balls is selected, 
v white balls will be obtained in n extractions, is by Bernoulli’s 
theorem, 

Now suppose that the probabilities of choosing the first, second, 
third,... subclasses of urn are p 0 , p v respectively. Then, 
by Bayes’s theorem, the probability that the urn chosen is one 
containing ot white balls is 




oc v (a— a) n - v 2? a 

{ l v (a — 1) w +2 v (a — 2y*~*p 2 +... + (a-^fl^p a ^} ‘ 


It will be noted, therefore, that the solution of the problem 
depends on a knowledge of the probabilities p v p 2 ,... about 
which we have no information whatever. If we make the 
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assumption that all types of urn have equal probability, then 
p 1 = p 2 = ... = p a -v an( l the probability sought is 

a v (a—a) n ~ v /{l v (a— l) n_v 4-2 v (a—2) n ~ v -f-... + (a—l) l T n ~ v }. 

It is readily shown that this is a maximum for the value of oc 

such that - = -: that is, the most probable composition of the 
a n 

urn is that which gives for the required probability the value ~. 

Extension to Functions of a Continuous Variable 
Let V(t) be a function of a continuous variable t which is 
defined in the range (— 00 , 00 ) and which gives the proba¬ 
bility V{t) of the occurrence of the variable in the interval dt 
about the position t. Suppose that a new population is con¬ 
structed from the distribution according to the following law: 
at a distance x from the position t , the ordinate V{t-\-x) is to 
be swept in with a probability p(x) and allocated to the position 
t, the value of x extending over the range a < x ^ b. liU{t) is 
the probability, in the new population, of a value t occurring 
in an interval dt about t, then the contribution to U(t) at the 
position t is given by 

V(t+x)p(x). 

Thus the probability U (t) is given by 

b 

U(t) = J V(t+x)p(x) dx. 

a 

It should be noticed that if the original probability function 
V(t) has a finite range, then the function V(t-\-x ), for values of 
x which take it beyond this range, is, of course, zero and makes 
no contribution to the integral. Thus U(t) has exactly the same 
range as the original function V(t). 

Ex. 1. An interesting application of the previous results has 
been made by Eddington, f Suppose that the probability 
function u(t) for the sample is given, and that the law of 
selection is Gaussian, i.e. 

p(x) = -j-exp^—Wx 2 ). 
t Eddington, Monthly Notices , R.A.S. 73, 359. 
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Then the probability function v(t) of the original population 
is given by the equation 


u(t) 


= ■— J* v(t-\-x)exp(—h 2 x 2 ) dx. 


Now write 


*+*)=*)+4I+SS+- 


, d ,x 2 d 2 , \ m 

( +a: rf< + 2!rf< 2+ "j t,() 

= exp|a;^jv(<), symbolically. 

oO 

Thus u(t) = j y- J exp^x~—h 2 x*j dx j v(t). 

— OO 

The integral is an operator as regards t, but a definite integral 
as regards x. 

Now J exp(— ax—bx 2 ) dx = J^exj)(a 2 /±b)] 

— OO 

then writing a = b = h 2 , we have 
at 

v{t) = exp|—^/ ihjuil) 

When A is large, it is sufficient to consider the first few terms of 
this expression. Since u(t) is an empirical probability function, 
it is better to express v(t) in terms of u(t) and its successive 
differences rather than its differential coefficients. 

Now u'\t) = AM<)-A 3 m(() + ^A 4 m(<)--, 

and u lT (t) = A*u(t)+-. 


Hence 
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Hence 

A comparison between the result just obtained and Bayes’s 
theorem is inevitable. There, the passage back from the 
function U(t) to V(t) was not necessarily possible for any given 
function U(t) nor, when possible, was it necessarily unique; 
we saw that this circumstance was associated with the fact 
that the range of U(t) was not in general that of V(t ), and that 
this depended entirely on the law of selection p(x). In our 
example, however, the variable t is continuous and the two 
ranges are identical; the passage back, when it can be achieved, 
is unique—no arbitrariness is involved. In that case, Bayes’s 
theorem, as stated above, gives the probability U(t) that a 
certain variable t , derived from a function V{t) calculated by 
the process defined by p(x ), came from the range (a, 6). This 
is the inverse form of Bayes’s theorem as usually applied to 
determine the ‘probability of causes’, and the application is 
legitimate if we bear in mind that the method of selection p(x) 
is assumed to be given; it cannot be chosen arbitrarily. 

From a knowledge of U(t) both V(t) and p(x) cannot be deter¬ 
mined separately; and it is by ignoring this vital fact and by 
tacitly assuming that p(x) is some such function as unity or 

4^-exp(—A 2 x 2 ), that writers have been led to conclude that 
Vt t 

Bayes’s theorem may be used to trace back, with a certain 
degree of probability, the antecedent events which have given 
rise to the function U(t). This procedure, as we have seen, is 
wholly fallacious. 

In the foregoing example it is assumed that the law of selec¬ 
tion is the normal error law. If this law is not obeyed in the 
case to which it is applied, the result will be invalid in practice. 

Ex. 2. Let us now suppose that the functions v(x) and p(x) are 
both Gaussian, so that 

= ^exp(—A 2 z 2 ), andp(z) = A-exp(— h' 2 x 2 ), 
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where h and h' are known constants. Then u(t) is given by 

00 

u(t) = — J exp{— h z (t-\-x) 2 —h' 2 x 2 } dx 

— CO 

hh 1 I K‘h’V\ C 1 , ,,J 

— oo 

by completing the square in the exponent. 


hH V 


h 2 +h' 2 


dx, 


By changing the variable to z — x -f 


hH 


integral to the form 


h 2 +h' 2 




, we reduce the 


on evaluation. 

It follows that 

u(t) = Vt rjw+wy 

Hence the function u(t) also follows a Gaussian law 


hh' 


I h2fl ' 2 


u(t) = J-exp (-h n2 t 2 ), 


where 


*„ 2 __ h 2 h' 2 
h*+h'*' 


If the standard deviations of v(x) and p(x) are a and o', 
respectively, so that o = o' — —the standard de¬ 
viation a" of u(t) is therefore given by o " 2 = o 2 -\-o' 2 . 

Evidently the theorem we have obtained can be inverted; 
for if u{t) and p(x) are both Gaussian functions, similar reason¬ 
ing shows that v(x) must also be Gaussian. Thus, in conclusion, 
we have the result: 


If the distribution of the original population and the probability 
of sampling follow the Gaussian law , then the sample also follows 
the Gaussian law \ and if the sample and the probability of sampling 
follow the Gaussian law , so does the original population . 

Ex. 3. This result may be extended to a series of samples, 
each of which is drawn from the preceding one. Thus, suppose 
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that v(t) is a Gaussian population and that exp (—A 2 * 2 ) is the 
probability of the sampling. Then a sample u^t) of the popula¬ 
tion is given by 

00 

u x (t) = J r(£+x)exp(— h 2 x 2 ) dx. 


Then, by the above theorem, u^t) is of the form A exp(— h\ t 2 ), 
where 111 

h\ = F+F 2 ' 

If now a sample u 2 (t) is drawn from the sample u^t), its 
magnitude is given by 

00 

u 2 (t) = J u 1 (t-\-x)exp(—h' 2 x 2 ) dx , 

— oo 

and the corresponding constant h 2 satisfies the relation 

h\ h\^ h' 2 

Similarly for a sample u 3 {t) drawn from u 2 (t), and so on. 
Hence the constant h n specifying the nth sample in this succes¬ 
sion is given by ill 


K a 2 _! + a' 2. 

Adding the n equations so obtained, we have 

_L — I . JL 

h 2 ~ h 2+ h' 2 ' 

In terms of the standard deviations this becomes*)* 

== v?-\-no' 2 . 


Two-dimensional Distributions 

Suppose, for instance, that a sample U(x, y) is obtained by 
taking the mean of the values of the population F(z,y) at the 
four points (a:± 1, 1). We then have the equation 

4 U(x,y) = F(*-fl,y+l)+F(a:—l,y+l)-f 

+ V(x-l,y-l)+V(x+l,y—l). 
t Cf. p. 130. 
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If we write 

EV(x,y) — V(x+1 ,y) 
and FV(x,y) = V(x,y+ 1), 

we obtain 


4 U(x,y) = (EF+E-'F+E-'F-'+EF-^Vfr^) 


= ( E+E-')(F+F-')V(x,y) 
(&+1XF+1), 


EF 


V(x,y). 


The particular solution of this equation is 
4 EF 


V (x, y) == 


(E 2 + \)(F*+l) U ( X,y) 
(1+A)(1+A') 


U(x,y), 


{1+J(2A+A 2 )}{1 +J(2A'+A' 2 )} 

where the operators A and A' refer to x and y respectively. 
Thus 


V(z,y) = (1+A)(1+A')(1-A+1A 2 -...)X 

X (l-A'+lA'*-...)U(x,y) 

= (l-lA 2 +...)(l~lA' 2 +...)U(x,y) 

= U(x , y)-\(A 2 -{-A ,2 )U{x i y)-\~lA 2 A' 2 U(x ) y ).... 

To this must be added the complementary function 
A sin \n tx-\- B cos \ttx-\- C sin \ iry +Z> cos \iry. 

Ex. A certain substance is being deposited on the inside of a tube 
6 cm. in length, and measurements of the extent of the deposit are taken 
at intervals of one second at distances of 1 cm. along the tube. The 
following are the results obtained (in grammes): 


Valuta of t 



0 

1 

2 

3 

4 

H 1 

0-35 

0-45 

0-55 

0-65 

0-75 

£ 2 

0-05 

0-85 

105 

1-25 

1-45 

* 3 

1-15 

1-45 

1-75 

205 

2-35 

1 4 

1-85 

2-25 

2-65 

305 

3-45 

2 5 

2-75 

3-25 

3-75 

4-25 

4-75 

6 

3-85 

4-45 

505 

5-65 

6*25 


Assuming fhat the readings at ( x t t) were really the average of those at 
x and x+ 1, taken at times t and *+l respectively, correct the above 
data so as to give the true values at ( x , t). 

4260 M 
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If the true value is V(x 9 t) and the sample is U(x t t) f then 
U(x,t) = «F(* f l)+F(*+l f l+l)} 

= \(EF+\)V(x,t). 

Hence 

= <l+AXl+A')+l Cr( * ,<) 

= (l + KA+A'+AA')}- 1 ^*,*) 

= {l-J(A+A'+AA')+l(A+A'+AA')*-...}t/(*,<), 

and the problem is reduced to that of constructing a twofold difference 
table from the one given. 

Two-dimensional Continuous Distributions 
Suppose that the given sample u(x , y) is a continuous function 
of two independent variables x f y, and that it is derived from 
a population v(x, y) which is also a continuous function of the 
same variables. Let the selective process p(g, tj) by which 
u(x, y) is obtained from v(x , y) be such that the probability of 
choosing a sample in a region of area dgdrj surrounding the 
point is p (£> v) ^dr\. Then the law connecting 

sample and population is evidently 
00 00 

u(x,y)= J J v{x+£,y-\-ri)p(ti,ri) d^dr]. 

— 00 —00 

Let us apply this result to the case in which the law of 
selection is Gaussian, so that, for instance, 

P(£, v) = — exp(-A 2 f 2 — *V)- 

7T 

We then have 

00 

J »(*+£. y+*?)exp(— h 2 t 2 —k*rf) dtdri. 

•CO 

If we denote the operators 8/d£, 8/dr) by D and D' respectively, 
we may write 

v{x+£,y+ri) = v(x, y)+(fD+ r)D’)v(x,y) + ^ D+ ^ D ) . v ( x ,y)... 
— exp(£D -\-r)D')v{x,y). 
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Hence, symbolically, 


oo oo 

u(x, y) = ^ J J exp ({D+r,D')v(x, y)ex V {-(h^+k^)} d£d v 

— oo —OO 

00 oo 

~~!t \ J ex P(^~ h 2 £ 2 )exj)(r]D'—k 2 7 ) 2 )v(x,y) d£dr] 

— oo —00 

oo 00 

= ^ j* exp ($D—h 2 $ 2 )d$ J exp {-qD’—k 2 ^ 2 ) drj v(x,y). 

— 00 — 00 

00 

A. J exp (£D-h*P) d£ = exp(D 2 / 4 h 2 ) 

— oo 
00 

4 - f exp^D' — fc 2 ^ 2 ) — exp(Z)' 2 / 4 A; 2 ), 

\7T J 

— ao 

u(x y y) = exp(Z) 2 / 4 A 2 -f-Z>' 2 / 4 A; 2 )i;(x,y). 

®(*»y) = cx p{“i(^+^-)} M(x >y ) > 


Now 


and 


so that 
Thus 


or 




X 


1—i 

4 k 


D ' 2 + — - 1 D' 4 - 

T 2 !( 4 i ! ) ! 


... w(*,2/). 


Ex. Suppose that u(:r, t/) is the mean value over a square of side 2 T 
about the point (x, y ); then 
T T 

<x,y) = jA J J + y + ^dijdr) 

~T - T 
T T 

-i// exp(fZ>-f 7jD')v(x f y) d£drj, as before, 

-r -r 

T T 

= J exp(fD)df J oxp(ijD') dij v(x, y) 


_ J_f! 

' 4T*L 


T -T 

exp( ZZ>) - exp( - TZ>)j |exp( T D')-e x p( - TZ>') 


Z> 


Z>' 




1 sinh TZ> sinh TD' 


J*2 


v(^,y). 


Z>Z>' 
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Hence TD TD' 

v(x ’ y) = ^n^^hT^ u(x ’ y) 


= ( 1 + i T*D* +...)-»( 1 + lT*D'»+...)- 1 u(*,y) 

( '^2y d l u \ 

fai + fyi) approximately, 
if higher derivatives of u(x,y) may be neglected. 


2. The determination of a population from a given set 

of samples 

On the Determination of Hypothetical Populations 

The crucial problem with which this chapter has been con¬ 
cerned is how to make the fullest use of samples of a population 
for the drawing of conclusions regarding its structure; we are 
in fact trying to arrive at a mathematical method that will 
assist us in learning from experience. The general principle, 
already adopted in Chapter VIII, which we use for this purpose 
may be stated as follows: 

(1) We assume a class of hypothetical populations capable of 
providing the samples found. 

(2) This capacity involves the further assumption of a method 
of selecting the samples. 

(3) We can then write down the probability that from any 
one of the class of populations, with this method of selection, 
precisely the given set of samples will be obtained. The proba¬ 
bility will in general vary for different members of the class of 
populations; we then determine the member for which this 
probability is greatest, and choose it as the ‘most likely’ for 
the given set of samples in contrast to the ‘most probable’ 
sample. 

The term ‘most likely’ is used here because now we are 
actually concerned with a new type of problem; we are not 
in fact discussing the question of the probability of occurrence 
of a particular population among a given class: the probability 
is now attached to the samples, not to the population. Thus, 
the ‘most likely’ population is defined as that member of a 
given class which yields the given samples with the greatest 
probability. 

Suppose, for instance, that x l9 x 2 >—> x n are the measured 
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observations of a number x; then the deviations of the observa¬ 
tions from x ai:e respectively x—x v x—x 2i ..., x—x n . 

If p(e) is the probability that the method of observation gives 
a deviation of magnitude e, the probabilities of obtaining the 
stated deviations are respectively 

p{x—x i), p(z-x 2 ), ..., p(x-x n ). 

Hence the probability of obtaining a combination of deviations 
x—x v x—x 2i ... f x—x n simultaneously is the product 

p(x—x x )p(x~x 2 )...p(x--x n ). 

We propose to assume that the best approximation to x 
derivable from these samples is that value which makes the 
given combination of deviations the most probable that would 
occur in a sample of n observations. In effect we inquire, what 
process of selection applied to the readings x will give precisely 
this combination with the greatest probability? We are now in 
a position to apply this principle to a series of cases; we illus¬ 
trate first with a case in which the samples have been obtained 
by a Bernoulli law of selection. 

The Method of Maximum Likelihood 

Suppose that the members of a population of given number 
N possess a certain characteristic in the unknown proportion 
p : I. If a series of samples n v n 2 ,... in number drawn from it is 
found to contain the characteristic in the proportions r l /n v 
r 2 jn 2 ..., what information can be deduced with regard to the 
value of p? This is to raise a problem in induction if it is 
implied that the population has to be specified by means of 
the samples; and like all such problems it can be reduced to 
a deductive one by making an appropriate assumption. We 
postulate a class of population, capable of yielding the given 
samples, for which the probability of drawing the samples is 
calculable; we then inquire which member of this class will 
with the greatest probability furnish precisely the samples that 
have been found. 

Thus, if a penny is tossed 100 times and found to give 50 
heads, any probability p of obtaining a head, other than the 
value p = would give a smaller probability for the observed 
occurrence than if p were actually 
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If we apply these considerations to a class of Bernoulli distri¬ 
butions defined by the probability law n C r p r ( 1—p) n ” r , where 
n is the size of the sample and r members possess the required 
quality, then the problem resolves itself into finding the value 
of p for which this probability is greatest. Since p does not occur 
in the coefficient n C ri we have simply to maximize p r (l—p) n ~ r 9 
that is, to choose p so that rjp = (n—r)/(l—p), or r — np . 

The expression p r (\—p) n ~ r is called by R. A. Fisher the ‘likeli¬ 
hood’ : it is not in itself a probability, as we have seen, but an 
instrument for selecting the ‘most likely’ population from 
among a given class. 

Ex. 1. An urn contains N black and white balls, pN of which 
are white. From it are drawn n x balls, each being replaced before 
another is drawn, and r x of these are found to be white. A second 
such extraction of n 2 balls is made, and among them are r 2 white 
balls. What is the most likely value of^>? 

The probability of obtaining the sample in question is 
ni C n V rx ( 1 — p ) Wl ■" ri X na C ri p r * ( 1 — p) 

Thus to find the value of p for which this probability is greatest 
we have to maximize the expression pn+r a (i__^)n,+w a -ri-r a j n 
analogy with the preceding case this gives r x -\-r 2 = (n x -{-n 2 )p. 

Ex. 2. Suppose that the n x balls are marked as they are 
drawn and that in fact no ball is drawn twice. If these n x balls 
are now removed, the probability of obtaining a white ball at 
the second extraction is 

Vi = (pN-rJUN- -nj, 

and that of obtaining the given samples is proportional to 
p r '( 1 — p) n '~ r 'p{*( 1 — Pi) n *~ r2 , 

that is, to 

p r '( 1 —p) ni ~ r '(pN—r x ) r *(N-n x -\-r x —pN) n *- r *. 

The value of p for which this is a maximum is given by the 
equation 

r i ”i— . iV>2 N (w 2 —r 2 ) = Q 

p 1 —p pN-~r x N—n^r^pN 
It will be noticed that, if N is large, the value of p is given by 
tti --*! . r 2 n 2 -r 2 = Q 

p 1 —p p 1 —p 

so that rj+r 2 (n x +n 2 )p, as before. 
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Ex. 3. If N — 14, n 1 = 5, r l = 2, n 2 = 2, r 2 = 1, then in Ex. 1 we 
have p = 3/7 — 0-429. In Ex. 2, the most likely value of p is found by 
maximizing the expression 

P*( 1 ~P) 3 (1p — 1)(11 — 14p). 

Thus p is the root of the cubic 

343p 3 —469p*+164p—11 = 0 
lying between J and 1, i.e. p = 0-404. 

The method of maximum likelihood is applicable to hypo¬ 
thetical populations defined by more than one characteristic. 
For example, suppose that an urn contains balls of t different 
colours whose relative frequencies are p v p 2 ,..., p t . If a sample 
of n balls is extracted and found to contain r r balls of the first 
type, r 2 of the second, and so on, we may inquire what values 
of j Pi, p 2 ,... make this sample the most probable. The proba¬ 
bility of obtaining the sample is, by Bernoulli’s Theorem, 


where 

and 


n\ 


ri!r,!. 


-p['Pl'...p r t ‘, 


P\+P*+-+Pt = 1 
r i+ r 2 +---+ r < = n. 


( 1 ) 

( 2 ) 

( 3 ) 


If P is a maximum, so is logP; whence, if Sp lt 8p it ... denote 
variations inpj, p 2 ..., we have the condition 


~ZPi+~*Pi +- + %8p t = 0, (4) 

Pi P 2 Pt 

where, by (2), 8pi+Sp 2 +...+8p, = 0. (5) 

Combining (4) and (5) we see that the conditions for a maxi¬ 
mum are 


H = !a 
Pi Pi 
It follows that 


r i+ r 2 +-+r < 
Pl+Pi + ’-’+Pt 


— n, by (3). 


Pi = r il n > Pi = r il n . Pt = rjn. 

Ex. An um contains black, white, and yellow balls in unknown pro¬ 
portions p x : : p z . Six balls are extracted, replaced, and six others 


Black 

White 

Yellow 

1 

2 

3 

3 

2 

1 


extracted. If the numbers of black, white, and yellow balls obtained in 
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the two extractions are as shown, the values of p v p z , p z for which the 
probability of obtaining this pair of samples is greatest are 

Pi = Pi = Pz = i- 

Tho probability of drawing these samples will then be 



Black 

White 

Yellow 

2 

2 

2 

2 

2 

2 


If, instead, the two extractions had given rise to the second set of 
numbers shown, the values of p v p % , p z obtained by maximizing the 
probability of obtaining the samples would have been as before, but 
the probability of drawing the samples would be 



and this is less than P. In fact we have 

P'/P = 2 4 /(3! ) 2 = 4/9. 

When we have determined the population for which a given 
sample is the most probable, it does not follow that even that 
sample is a very ‘probable’ one; its probability will depend on 
the number of types that might be drawn from such a hypo¬ 
thetical population, and on the relative frequency of occurrence 
of each type. Let us illustra te with a simple problem. 

An urn contains black and white balls in an unknown pro¬ 
portion p: 1. A certain number n is extracted, with replacement 
immediately after each extraction, and it is found that r of these 
are white. A second sample is obtained in the same manner. 
Let us suppose that in all 12 balls have been drawn and 6 of 
them found to be white; then the second sample consisted of 
12 —n balls, 6 —r of which were white. 

Since the ratio of the number of white balls extracted to the 
total number is it follows from the previous discussion (p. 166) 
that the ‘most likely’ value of p for the hypothetical population 
is \. 

Consider now the probability of drawing just such a pair of 
samples from a population for which p is actually equal to 
The probability of drawing the first sample is n C r ($) n , and, since 
the balls are then returned to the urn, the probability of 
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drawing the second is 12-w C' 6 _ r (£) 12 - n . Hence the probability 
of obtaining the pair of extractions is 

P = «C f 12-^ 6 . r / 2 12, 

and this will of course vary with n and r. It is not difficult to 
determine the values of n and r for which P is a maximum; 
since n and r can vary independently within the given limits, 
we require 

n -l Cr 13-n C ^ r < n Cr 12 -n C(i _ r > n^ C ll-n C ^ 

and w <7 r V 2 ~*<Vr < n C r 12 ~»C 6 _r > ”C r +™-”C 5 _ r . 

From these conditions it follows that 


13r_l 

6 


< 


13r 


and -- 
2 




In virtue of the restrictions placed upon r, the second condi¬ 
tion is a consequence of the first. We thus obtain the solution 
r = 5, n = 10, or r = 1, n = 2, and with either of these pairs 
of values P = 504/2 12 = P 0 , say. 

In the accompanying table we give the proportions of white balls 
obtained in twelve pairs of extractions, with the corresponding values 
of P and P/P 0 . Thus, although P Q is itself small, it is 504 times as great 
as the probability of obtaining the first pair of extractions shown. 


First drawing 

Second drawing 

PX 2 " 

p/i\ 

6 : 6 

0 : 6 

1 

0002 

5 : 5 

1 : 7 

7 

0014 

4 : 4 

2 : 8 

28 

0056 

5 : 6 

1 : 6 

36 

0072 

4:5 

2 : 7 

105 

0-21 

1 : 4 

5 : 8 

224 

0-448 

2 : 6 

4:6 

225 

0-45 

2:3 

4 : 9 

378 

0-756 

3:6 

3 : 6 

400 

0-8 

2 : 4 

4 : 8 

420 

0-84 

0 : 1 

6 : 11 

462 

0-924 

1 : 2 

5 : 10 

504 

1 


The Method of Least Squares 

The second law of selection to which we shall apply the fore¬ 
going principle is the Gaussian. It is worth while remarking 
that the method which we develop in part covers what is vari¬ 
ously called curve fitting, smoothing of data, and graduation. 
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Each of these processes, whether it be the determination of a 
smooth curve that lies evenly among a set of points, or the 
smoothing out of an irregular curve, or the specification of an 
algebraic expression to cover a set of data, is in effect the deter¬ 
mination of a hypothetical population, since each is merely a 
step towards specifying values of a variable at positions other 
than those immediately supplied by the data. 

If the assumed hypothetical population is Gaussian, then in 

the notation of p. 165, ^(c) = -^-exp(— h 2 e 2 ) y so that the proba- 

“V7 T 

bility of obtaining the given sample is 

^-exp{— h?(x— * 1 ) 2 }^exp{— h^x—x 2 )}...^-ex^{-h 2 (x—x n f] 

}>n . n * 

= ^ ex p(-A 2 J>-*r) 2 )- 

For a given process of selection, A is a known constant; the pro¬ 
blem, as before, is to find x so that the probability is a maxi- 

n 

mum. This is equivalent to determining x so that 2 ( x—x r ) 2 is 

r= 1 

a minimum and, as we have seen, gives as the value of x the 
mean of x v a; 2 ,..., x n . This method of determining the best 
value of an observation by assuming that the sum of the squares 
of the deviations from it shall be a minimum is called the 
Method of Least Squares. Some writers prefer to begin with 
this method as the initial assumption, without directly implying 
the use of a Gaussian law. 

Determination of the Precision Constant 

The probability that the set of readings x v x 2 ,... f x n will 
occur is * n 

—exp{—A 2 2 (* r -a) 2 }, 

where a is the mean of the readings. 

Using the same principle as before, the value of h to be chosen 
is that which makes the above probability a maximum. Thus 
h is determined by the equation 

^{fc n exp[-A 2 J (av—a) 2 ]} = 0, 
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so that 5 = 2h 2 (x r —a) 2 . 


Hence 


n 


1 


2 2 (*r-a) 2 2or' 2 ’ 


where a' is the standard deviation for the given set of readings. 
It follows that our choice of h is such as to make the standard 
deviation for the set x v x 2 ,..., x n coincide with that of the 
assumed hypothetical population. 


Curve Fitting 

Now suppose that Y = f(x y a) represents a possible series of 
hypothetical populations, obtained by varying a, from one of 
which the given sample is presumed to have been drawn. As 
before, we shall assume that the probability of committing an 

error of magnitude e is -^-exp(—A 2 e 2 ), and that the precision 

V7T 

constant h is the same for each measurement irrespective of 
its position in the range. Suppose that readings y ly y 2 ,..., y n 
are taken at the positions^ x v x 2> ... f x n , and that Y v E>,..., Y n 
are the corresponding values of the hypothetical population. 
This assumes that the x’s are accurate. Then the probability of 
drawing this sample from the population whose parameter is a is 

Aexp 

= 7 ^ 2 ex P(- /l2 i^-2/r) 2 )> 

where Y v Y 2 ,...,Y r depend on the parameter a. 

We propose to choose as the hypothetical population among 
the set f(x , a) the one that makes the occurrence of this set of 
readings the most probable. We have thus to make 2 (^r~~l/r) 2 
a minimum, i.e. we have to choose a so that 2 [f(x, a )—y r ] 2 
is a minimum. Hence a must satisfy the equation 



= 0 , 


and thus, on the foregoing assumptions, the hypothetical popu¬ 
lation is determined. 


t If the readings are weighted, i.e. if several readings occur at the same 
position, the a;’s are not all different. 
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Ex 1 * 1 2 13 4 5 

* * y -0-8 0-9 I 3-1 5*3 6-8 

The values of y shown are subject to accidental errors. Given that 
the population Y from which they are extracted is one of the system 
Y — 2x+a, determine the best valuo of a . 

We have to choose a so that the expression 

(2*8-}-a) 2 + (3*l -f-o) 2 -f (2-9 + a) 2 -f (2*7 -fo) 2 -f(3*2-f-a) 2 
14*7 

is a minimum, whence a =-— — —2-94. 

5 

Ex. 2. Find the best values of a and m if the values Y are given by 
the function Y = mx+a. 

Ex. 3. If it is desired to represent the following values of y approxi¬ 
mately by a function of the form y — a-\-bx-\~cx 2 , determine the best 
values of a, 6, and c. 

10 
— 2 

Here a, 6, and c have to be chosen to make the sum of the squares of 
the deviation from y a minimum. 

Note. Suppose that we wish to fit a Gaussian law of the form 
y = exp(a-f bx-\- cx 2 ) to a distribution curve. We might proceed by 
taking logarithms and determining the best values of a, 6, c (as in the 
above example) for the readings. Such a method, although convenient 
in practice, is not strictly justifiable, since, if the errors in y are dis¬ 
tributed according to a Gaussian law, those of log y are not. 

Ex. 4. Find the values of a and b for which the parent population 
y — ax + 6 sin x would give the pairs of values 

20 

3-421 

as the most probable, assuming that the deviations follow the Gaus¬ 
sian law. 


X 

0 

1 ! 

2 

3 

4 

5 

6 

7 

8 

9 

y 

7-98 

11-51 

14-02 

15-46 

16-01 

15-51 

13-98 

11-52 

8-02 

3-31 


X 

0-2 

0-8 

1-4 

y 

0-202 

0-882 

1-821 


The Line of Regression 
Suppose that „ „ „ 

VvV»—Vn 

are n pairs of data related in the sense that changes in the 
values of the a^s are accompanied by changes in the y’ s. 
Assuming that there are no errors in the z’ s, we wish to deter¬ 
mine to what extent the numbers (z, y) may be considered as 
derivable from the hypothetical population 

y = Az+B, 

assuming that the deviations follow the Gaussian law. 
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We have thus to minimize the expression 

2 (. Ax r +B-y r ) 2 . 
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r— 1 


(1) 

( 2 ) 


This means that A and B must satisfy the equations 

^x r (Ax r +B—y r ) = 0 , 

2 (Ax r +B—y r ) - 0, 

that is, A^x^B^x r = ^x r y n 

A 2 x r +nB = 2 y ,• 

It is convenient to replace x and y by their deviations from 
the corresponding means X , Y\ writing x = y — Y+rj, 

we have 

2 x r = nX, J,y r = nY, 

2*? = 2 (x+f r )« = *x«+2ft 

and 2 ^rVr = 2 (-X+^K^+^r) =.»xy+ 2^,1?r- 
Thus (1) and (2) become 

ii(jr»+a*)+5z = 

n / 

+5 = 7 , 

where or^ is the standard deviation of the x’s from X . 

Solving (3) and (4) for A and B wc obtain 

ItrVr 


( 3 ) 

( 4 ) 


~~ no* 2 


V r 


l?r ' 


B = Y X ^ ^ Vr 
2£ ' 


Hence the hypothetical population is given by the curve 
y- Y = l£r£(x-X). 

This curve is called the ‘line of regression’ for the given data, 
and can be written as 


y-Y x-X 

- -= r -, 


( 5 ) 


where a x and a y are the standard deviations of the x’a and y 's 
from their respective means and 

ItrVr 
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Correlation 

We are now in a position to examine the problem of ‘corre¬ 
lation*. Suppose that the two sets of data as in the preceding 
paragraph have been found. If the points ( x ri y r ) are plotted 
on a diagram and form a ‘good* curve or lie very nearly on a 
straight line, then the principles of curve fitting for the selection 
of a hypothetical population can be applied at once because the 
necessary specification of the population is not difficult to make. 
When, however, the points (x r) y r ) are so scattered as to render 
this impossible, we are at liberty to make any reasonable 
assumption in terms of which to interpret the data. 

Two methods of procedure are usually adopted. We begin 
with the assumption that the x’s and y 's are attempted measures 
of points on some straight line but that the x’s are measured 
without error. Then it follows from the previous section that 
the hypothetical population is given by 

y-Y x-X 

- -= r -, 

a v a x 

where X, Y are the means of the x’s and y* s, cr x , u y are the 
corresponding standard deviations, and 

I(X—qy)(7—y f ) 
J{Z(X-x r )*Z(Y-y r )*Y 

The curve so obtained represents a special member of an 
assumed class of hypothetical population, called the ‘line of 
regression* of y on x, for it measures the extent to which a 
variation in x effects a change in y; in fact, when x changes by 
°x> yl°v changes by r. 

We could, however, have approached the same problem by 
choosing a second class of hypothetical population on the 
assumption that the i/’s were correct values and that the x’s 
involved errors. It is easy to see that the member of the 
population then selected would be 

x-X y-Y 
-= r- -. 

This represents the line of regression of x on y\ when y changes 
by o yi x\o x changes by r. Thus r is a measure common to both 
the hypothetical populations; it is called the ‘coefficient of 
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linear correlation’ and is taken to be a measure of the extent to 
which the sets of numbers x and y are interlinked. 

It is clear that if the two lines of regression are coincident, 
then r 2 = 1, and if they are at right angles, r = 0. In the case 
r = 1, there is maximum correlation and x and y are linearly 
related over the whole range. When r = 0 the variation in x 
has no influence on the variation in y. Thus r is a number whose 
absolute magnitude lies between 0 and 1. We note that r may, 
however, be negative, in which case an increase in x is accom¬ 
panied by a decrease in y , and vice versa. 

Ex. Two sets of numbers are chosen in the intervals (0,4), (5,9), 
(10,14),..., (30, 34), with the following results: 


X 

l 

6 

12 

16 

20 

25 

y 

3 

6 

13 

16 

22 

28 


We thus obtain 


X = 16, Y — 17, 2 (X-x)(Y-y) = 679, 

2 (. X-x )* = 694, 2 (Y-y) 2 = 676, 

so that r is given by 

679 679 . . . 

r = 7(694X676) = 684 = °"’ a PP r <”“™ t ely. 


Generally, if we have two sets of numbers x v x 2 ,..., x n and y l9 y 2 ,..., y n 
such that x r and y r lie in the interval ( t rf t r+l ), and if the differences 
t r — t r+ i are small and equal for all values of r, then x r and y r will cor¬ 
relate almost exactly linearly. 

The method we have used to find the coefficient of linear 
correlation is capable of immediate extension. Thus, for 
parabolic correlation, we wish to find the value of A for which 
the hypothetical population 

Y' = XX' 2 , where X' = X ~ X , Y' = 

&X a y 

best fits the given numbers ( X[ , (X' ni F^). 

We have therefore to choose A so that 2 (F'— AX' 2 ) 2 is a 
minimum. 

Hence l -I Xr 2 Y’r <4 1 (X-x r )H Y—y r ) 

A - ZW ~ ° v I(X-x r )* 

' “ ^{Z(Y-y r ) 2 )Z(X-x r )* • 

The interpretation of A in this case is, of course, quite different 
from that for r in the previous case. 
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The Method of Maximum Correlation 
Let (x li x 2i ...,x n ) and (y^y^^^Vn) be two sets of observations 
obtained by two different experimenters to represent varia¬ 
tions in the same phenomena at positions x which are accurately 
given. If the experiments have been carefully performed and 
the differences between corresponding pairs ( x 8 , y 8 ) of observa¬ 
tions are due only to accidental errors, it follows that the two 
sets will be highly correlated; in other words, the line of regres¬ 
sion of x on y or of y on x will be very near the line y = x. 
We wish to determine from these observations a third set 
(z l9 2 2 ,..., 2 n ) which correlates most highly with the given sets; 
that is to say, if r xz and r yz are the correlation coefficients 
between the #’s and the z’s and between the y' s and the z’s, 
respectively, then the z’s are to be chosen so as to make some 
symmetric function F(r xz , r yz ) a maximum. Each such function 
defines a class of populations. Consider in particular 


Txv V(2«2%J>’ 


F TXZ~\~T y Z . 

Let X , Y , and Z be the means of the three sets of observa¬ 
tions, and l ; 8i rj 8 , and the deviations of x 8 , y 8 , and z 8 each from 
its mean; then n n n 

2 Zs — 2 Vs = 2 L — °. 

Ill 

_ 2&S« _ = 2 t 7»C» 

V(2fl2©’ 1/2 V(2^2£. 2 )' 

To simplify the notation we writer 

^8 _ „ Vs _ — h £* — r 

T2J\~ ” V2^ - *’ Via - 8 ’ 

Then the above relations may be replaced by 

2®. = 2 8 * = 2 C « = °> 

2«! = 2 6 ? = 2 C ?= i* 

r xy = 2 ®* 8 *> r xz ~ 2 ®» C *> r vi ~ 2 C 8 - 

Now if the function F is to be a maximum we require 

8 F = hr xz +%r vt = 0, (1) 

where hr xz = 2 «» Sc » and 8 V = 2 6 * 8< V ( 2 ) 

Substituting from (2) in (1) we thus require 

2 (®*+ 8 .) 8c « = °- 


( 3 ) 
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From the restrictive conditions on c 8 we have 

2 c 8 Sc 8 = 0 and J Sc 8 = 0. (4) 

Hence, from (3) and (4) we obtain 

X (a»+b a +Xc s +fi) 8c, == 0, 

where A and /i are constants to be determined. 

Equating the coefficients of Sc 8 to zero we have 

= 0 (s = l,2,...,n). (5) 

Summing the s relations (5) we obtain 

2 a 8~^~ 2 2 c 8+ s ^ == o, 

whence we deduce that /x, = 0. 

Multiplying (5) by c 8 and summing, we have 

r xz+ r yz+^ — 0- 

Accordingly (5) takes the form 


a s+ b 8 = ( r xz+ r vz) c s ( s = 2,...,n). (6) 

Multiplying (0) by a 8 and 6 8 respectively and summing, we 
obtain 1 +r xu = (r xz +r vz )r„ | (?) 

^ + r xi/ = ' 


From (7) it follows that r x 


' 1/2 


( 8 ) 


Equations (6), (7), and (8) serve to determine r xz , r ys , and c 8 . 
Thus from (7) and (8) we have 


r xz = r yz = J{Ml+r xv )}, (9) 

and from (6) c, = (a,+6,)/V{2(l+»•*,,)}. (10) 

Since r xi/ is positive in the case with which we are concerned, 
and is moreover less than unity, it follows from (10) that c 8 is 
slightly greater than the mean of a 8 and b 8 . 

Returning now to our original notation we have still to 
determine z 8 . 

We have 2 , = Z+C 5 = Z+yc 8 , (11) 

where y = <J( £ Cl) and Z are unknown. 

We propose to determine the latter by the method of least 
squares. We have thus to make 

2{(z M -x 9 )*+(z 9 --y 9 )*}, 

N 


4260 
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i.e. 2 {(Z+yc,-X-O z +{Z+YC t - Y- Va )% 

a minimum for variations in Z and y. 

We thus obtain the equations 

Z=i(X+7) (12) 


or 


and 2y 2 c s = 2 & c s+ 1 Va c a> 

2y = r xz U( 2 £ 8 2 )+V( 2 T??)}. by (8). 

In terms of the standard deviations this result may be 
Y = ynr xz (cj x +o y ). (13) 


written as 
Hence 


Z 8 ~ Z + Y C 8 

- j ( x + y )+ l *(<-,+<.„)[ : j l ^ jP + 

— Y)+i(°x + Cr v)(£8l<7x+lsl< r v)- (i*) 

Thus all the constants in the calculation of the set (Z 8 ) have 
been determined. 

For the application of this method to the general case of 
m given sets of observations and for more general forms of the 
function F, reference may be made to a recent paper.f 


Linear Correlation in General 


Suppose that 


X v X n i 


V\i Vn 

is a given system of data. We may inquire which member of 
the class of hypothetical populations x cos oc+y sin a = p y where 
a and p are variable, will provide this system of data with the 
greatest probability. 

Let us suppose that ( X r , Y r ) is the point on this line to which 
(x r , y r ) is an empirical approximation, and that the errors in the 
placing of x r and y r occur independently with frequencies deter¬ 
mined by the same Gaussian law. Thus the probability of an 
error X r —x r is proportional to exp {—h 2 (X r —x r ) 2 } and that of 
an error Y r —y r is proportional to exp{— h z (Y r —y r ) 2 }. The proba- 

j* H. Levy and J. C. Gascoigne, Proc. Phye . Soc. 48 (1935). 
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bility of obtaining the whole set of data is therefore propor¬ 
tional to 

exp{—A 2 (X 1 —x 1 ) 2 }exp{—A 2 (y i —y x ) 2 }... exp{-A 2 (X n -* n ) 2 } X 
xexp{-A 2 (F B -yJ 2 } = expj—A 2 1 [(X-x r )*+ (F r -y r ) 2 ]). 
If this is to be a maximum we require that 

Z[(X-x r )*+(Y-yr) 2 ] 
shall be a minimum. 

Geometrically, this expression represents the sum of the 
squares of the distances of the points (x r ,y r ) from the corre¬ 
sponding points (X ri Y r ) on the line 

zcosa-fysina = p. (1) 

Now unless ( X r , Y r ) is the foot of the perpendicular from 
(x r ,y r ) on (1), the given expression will certainly not attain 
its least value. Since the perpendicular from ( x r9 y r ) on (1) is 
of length x r cos oL+y r mn a—p, wo have to determine a and p 

so that ^ (x r cos (x-\-y r sin a— p) 2 

is a minimum. We thus require 

2 (x r cos oc-j-y r sin a — p) = 0 (2) 

and 2 ( x r cos a +2/r si n sin «—2/r cos <*) = 0- (3) 

Equation (2) may be written in the form 

y y y 

——- cos a-(- sin rx—p = 0, 

n n 

which shows that the mean position ( X, Y) of the points 
(x r> y r ) lies on (1). Writing, as before, x r = X+g r , y r — Y-{- ij r , 
so that 2 £ r — 2 Vr — 0> equations (2) and (3) become 

Xcosa+Fsina = p, (4) 

2 (^ r cosa-|-i; r 8ina)(^ r sina— 17 ,. cos a+X sin a— Fcosa) = 0, 
or (cos 2 a—sin 2 a) 2 £. ij r = sin a cos a 2 (£?—■>??), (5) 

whence tan 2a = (6) 

2 (C?~ Vr) 

Thus cthe required hypothetical population is given by 
(*—X)cosa+(y— F)sina — 0, 
where a is determined by (6). 


(*) 
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Eliminating a between (6) and (7), we find that (7) is one of 
the pair of lines whose equation is 

(x-XKy-Y) _ Jjrjr . . 

(a :-X)*-(y-Y)* ~ 2 (g~V?Y W 

It will be observed that these lines are the bisectors of the 
angles between the regression lines of x on y , and of y on z; 
one of them determines the most probable and the other the 
least probable of hypothetical populations represented by lines 
passing through (X, Y). The required line is the bisector of 
the acute angle between the regression lines. 

When a x = o y , it follows from (8) that the line (7) has the 
equation y-Y = ±(x-X). 

The Gaussian Law for Two Variables: Correlation 
We can approach the problem of correlation in the following 
way. 

Let rj v r ) 2 be the deviations of two sets of quantities from their 
respective means, and suppose rj t and r ) 2 are each determined 
from elements e v e 2 themselves also deviations from their 
means, and distributed about these means according to the 
Gaussian law. Suppose 

Vi = Cl€ 1 +b € 2 e 1 — Arj l +Br ) 2 
y or 

7)2 = a^+062 e 2 = Rt) 1 + St) 2 . 

Then the probability of the occurrence of the e’s simultane¬ 
ously within the ranges (€ 1 ,e 1 +Se 1 ) and (6 2 ,€ 2 -fSe 2 ) is 

exp( — h\ e?) de x ^ exp( — h\ ef) de 2 

XTT X7T 

= ^L^exp(— h\e\—h\c$)d€ 1 d€2- 

7r 

If for the c’s we substitute their values in terms of the ^’s as 
above, the result will give the probability of the occurrence of 
the two characteristics ^ and t) 2 in the ranges (t) v Th+Sih), 
( 7 ? 2 > t 72 + 8 > ? 2 )> viz - 

exp{— (At 7 ?+ 2)1,7)! r) 2 +vT)l)} 8t) x 8^ 2 , 
an extension of the Gaussian law. 

We are now in a position to generalize and interpret this 
expression. 
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As before, let t/j and t] 2 be the deviations of two measurable 
characteristics each from its mean. The problem is to represent 
the linkage that shows itself, if at all, between the quantities 
and rj 2 . Let them be determined by the corresponding devia¬ 
tions of a number of contributory elements, e v e 2 ,...,e m each 
from its mean. Then 

7 J 1 = a^-j- a 2 e 2 +---+0 m e m, 

Vi = b iei+b 2 € 2 +...+b m e m . 

Let us assume that each of the e r contributions conforms to 
the Gaussian law of error with precision constant h T \ then the 
compound probability that the e’s lie respectively and simul¬ 
taneously in the ranges 

(«l,ei+8«x), (€ 2 ,€ 2 +8e 2 ), (<r m , e m -f St J, 

is R = Aexj>{—'£(th?)de 1 de 2 ...de m , 

since they vary independently. 

In this expression substitute for e x and <r 2 in terms of rj l 
and rj 2 ; then the compound probability that rj l and rj 2 lie in 
the range (rj v and (rj 2 ,rj 2 -{-8r] 2 ), respectively, as well 

as that the remaining e’s should lie in the ranges 

( e 3> 6 3"{~^ € 3)> •••» ( € m> 

is of the form 

Q = Be- U 87] l 8rj 2 8€ i ...8e m , 

where U is the sum of 

(i) a quadratic function of rj x and rj 2 

(ii) a quadratic function of e 3 ,...,e m 

(iii) a function linear in rj v tj 2) e 3 ,..., e m . 

If Q be integrated with respect to the e’s from + to — oo 
the result will give the probability of the occurrence of the two 
characteristics tj 1 and r\ 2 in the ranges (rj v rj^rSrj^, ( rj 2 , rj 2 +8rj 2 ). 
Finally we obtain 

p = Cexp{--|(G 1 ^+2C 12 7] 1 7 ?2 +G 2 ^i)}Sr ?1 8r;^ 

clearly an extension of the Gaussian law of error. If r j x and 
7 ) 2 were quantities that could be chosen independently with 
standard deviations a x and a 2 respectively, then we should have 
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as the probability of the occurrence of rj l and ij 8 in the given 

^ P ' =_ 1 _ qxvI — — — 

2-na ^ P \ 2g\ 

The presence of the term ^ r) 2 in the quadratic expression brings 
out the linkage between the two quantities. 

Write 

C, vl+2C 12Vl r, 2 +C 2V l = + 




Now write 

and 

Thus 


n @ 12 . 1 

7T — ~ 2 ’ 
C 2 O'! 


r _ ^12 

a 


9 > 
a 2 


12 


V(W 


I »Q(l-r«); I = C 2 (l-r 2 ). 


Moreover, since P is to be a probability, when integrated 
with respect to r a and rj 2 from -f 00 to — 00 , it has the value 
unity. 

Hence « „ 


1 = C j J e~ u dr) 1 dr ) 2 


— OO — 05 


This integral is evaluated below with the result 
2 t tC 


1 = 


2776 V! c 7 2 ^/(1 —r 2 ). 


m (\-ch) 

Accordingly the law of error takes the form*)* 

1 


27rcr 1 ar 2 (1 


_L_ exP f_L_ (ri- + !»*)). 

—r 2 )* P ( 2(1— r 2 )\a( a|/j 


•f In this connexion we may note Mehler’s series for the correlation function 


where H r (x) is the Hermite polynomial (see p. 138). 
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It is clear that when there is no correlation (r = 0), i.e. when 
the term rj 1 rj 2 is absent, a 1 and a 2 become simply the standard 
deviations of the V s and tj 2 s in that case. That they still 
bear this interpretation even in the case of linkage may be seen 
from the following considerations. 

Wrifp 

2 = C exp{— l (C t 2C 12 xy+C 2 y 2 )}. 

Then from the integrals evaluated below we have for the 
second moment of x 


/,= / /=•**= . 


— OO — 00 
CO 00 


h ~ J f Z V 2 dxdy 


Similarly, 


2 nC l C 

- (C&-CW ~ 


a\. 


J = J J zxy dxdy — 


-2ttQ 


—— = r<7 1 a 2 . 


(C\C 2 -Cl 2 )i 

The integrals /, and I 2 are the squares of the standard devia¬ 
tions of the y’s while J is the sum of the products rj 1 rj 2 . 

Accordingly, r = —- = - 

°i a 2 V(A4) 

or for computational purposes we write 
r = 

jiivizvir 

It remains to evaluate the integrals referred to above. 
Consider 


A = | J exp{— \(ax 2 —2hxy+by 2 )} dxdy. 


Now 

ax 2 — 2hxy-{-by 2 = a(x — ~< 
\ a' 

-)■+ 

where 

A = ab-hK 


Also 

00 

| exp(~a 2 a; 2 ) dx = 

V 7 r 

a ’ 

Thus , 

— 00 



j exv{-l(ax*-2hxy+by*)} dx = J^ ex p(~l^y 2 \ 
— 00 ' ' 
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Hence w 

A ‘ = fi J “pH s**)* 

— CO 

__ /2tt Vtt _ 2tt _ 2tt 

V“a "/A ~ VA = j(ab-h*)' 

\ 2a 

By differentiating under the integral sign, with respect to 
a, A, and b, we obtain 

/ J x 2 e*v{-i{ax*-2hxy+by 2 )}dxdy = 
j j xyexp{-^(ax 2 -2hxy+by 2 )}dxdy = 

J J 3 /*exp{-|(ax 2 - 2 Aa: 2 /+ 62 / 2 )}da:d«/= 

Tes/s o/ Significance for Small Samples 
One of the most important contributions which statistical 
analysis has made to experimental practice lies in what are 
called ‘tests of significance’. Suppose that a series of measure¬ 
ments is made of a quantity which in ‘normal’ circumstances 
would have the value m. From a study of the observations, can 
we say that these are themselves normal measures of w? If 
not, can some measure be found to estimate the degree of non- 
normality? For example, a collection of trees is sprayed with 
an insecticide, and after a lapse of time the number of insects 
upon them is counted. A corresponding series of unsprayed 
trees (controls), of equal number, is also counted for insects. We 
may ask whether the difference between the average number of 
insects per tree in the two series is sufficiently great for us to 
assert that the effect of spraying has been significant. From 
the point of view of probability we may regard the problem in 
this light: we may say that there are n numbers x l9 x 2 ,..., x n , 
whose average is x; m is the mean to be anticipated if n were of 
infinite extent and if no factor had operated to disturb the 
equilibrium of the series. In asking, therefore, what is the 
significance of x— m, we are really inquiring with what proba- 
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bility one might expect a deviation of x from m to have as large 
a magnitude as this, under so-called ‘random’ conditions, i.e. 
when the numbers x v x 2 ,..., x n are chosen about m according 
to a ‘random’ law—for present purposes the Gaussian. 

Once that probability has been found, it will be possible to 
express in any given case the significance of the deviation in 
terms of the probability that it will arise at random. If it is 
very probable that a deviation of this amount will occur in a 
random sample of n , then there is little experimental signifi¬ 
cance in the deviation found; and conversely. It should be 
remarked that, in expressing the significance of the deviation 
in this way in terms of probability, we are really referring it to 
the significance of the probability—a matter which, as we have 
previously remarked, is to be decided finally by the experi¬ 
menter himself. It is clear that a corresponding investigation 
of significance can be made for the probability of occurrence 
of a deviation in any other typical constant from that of an 
assumed infinite population. 

Let there be a population, in number N, distributed according 
to the normal law 



where a is the standard deviation of the population and m is its 
mean. 

Suppose that a sample n in number is drawn from it, having 
magnitudes x v x 2 ,..., x n . We can write down the probability 
that the members of the sample should lie between x x and 
x x +dx v x 2 and x 2 +dx 2 ,..., x n and x ?l -f dx n . This is 


P = 


_ N_ 

<jyl(2n) 


exp 


(x x —m) 2 \ 
' ' 2a 2 


N 




- (-W- 


X 


x ^) exp ( _!£! ^ L 1 ■ 4x " 

ie - p “ 2 
Thus 

P = \4exp^— (x r —x) 2 +n(x—m) !S ']j dx 1 dx 2 ...dx n , (2) 
where A is a constant and x = 2 x rl n • 
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We now represent the sample by a point P in space of n 
dimensions having coordinates (x v x 2 ,... y x n ). Then 

*1 = *2 = - = 

is the line which is equally inclined to the coordinate axes. 
The perpendicular distance of P from this line is given by 

PM 2 = (* 1 — x)*+(x 2 —x)*+...+(x n —z)*, 

where M is the point 

Thus PM = sVn, where s is the standard deviation of the 
sample. Hence, given x and, therefore, M , for a fixed s , the point 
P must lie on a sphere of (n— 1) dimensions with centre at M 
and radius s^ln. 

An element of volume in this space may thus be expressed in 
terms of the variation of x , namely dx, and the variation d(8 n ~ 1 ) 
in surface area. Thus the volume element can be written as 

Cs n ~ 2 dsdx, 

where C is some constant. 

We now see that this representation of our sample, together 
with the symmetrical nature of the expressions for x and 5, 
enables us to replace (2) by the formula 

P = Cb w ~ 2 exp| — ^[2 (x r -x) 2 +n(x--m) 2 ] j dsdx. (3) 

This represents the probability that d sample will be drawn from 
the population, having a mean lying between x and x+dz, and 
a standard deviation between 8 and s-fcfo. It follows that, 
given the standard deviation 8 , the law of distribution of 
samples of the means is represented by the normal curve 

z = Z 0 exp|-£ 5 (x-m) 2 j (4) 

distributed about the same position as (1), but with standard 
deviation a/Vn. 

In the same way, if we regard x as fixed, the law of distribu¬ 
tion of the standard deviation of samples is given by 

y = y 0 8»-*ex p(-jgj). 


(5) 
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The constant y 0 may be found as follows: 
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Let 


/„ = j>exp{— g)* 

= ¥|" s ‘ _ * cxp (-£)* 


= -(p-l)I p - 2 , 

upon integration by parts. Hence we obtain 

\Hn- 2) 




(tt— 3)(r&—5)...l/ 0 , 


i(n-2) 


or /, 

according as w is even or odd. 
Evidently we have I 0 == 


(n—3)(n—5)...2 


and 


Since the 


n 


area under the curve (5) represents the total frequency N of the 
population, we obtain 

* _ /2M 4(n 'V-«expf-—\ 

i—5)...S. 1 y 7r\(7 2 ) [ 2a 2 ) 


(n— 3)(n- 
when n is even, and 


N ln\U"-V n 0 ( ns 2 \ 

y = (»-3K^:X2 ^) 5 exp (~^) J 


( 6 ) 


when w is odd. 

In a similar manner we obtain the value of the constant z 0 
in (4). Denoting by x the distance of the mean of the sample 
from the mean of the original population, we have 


N = z 0 J expj-^j dx, 


i. e . N = 

Thus (4) can be written as 
z 


In N / nx 2 \ 

= Vsw exp r^J- 


(7) 
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Now introduce the variable £ defined by 

£ = */«. ( 8 ) 
The probability of obtaining a value of a lying between a and 
a-\-da is, by (6), 



The probability of obtaining a value of x lying between x and 
x+dx is, by (7), 

In 1 ( nx 2 \ , In 1 / rw 2 £ 2 \ , Y 

* = Vs * - Vs ; exp (- 2 ?-)* 

This is also the probability of obtaining a value of £ between £ 
and £-f d£ for a given value of s. 

Hence, the probability of obtaining a value of £ between £ 
and £+d£, while x lies between x and x+dx and s between s 
and ds, is 

SP = 

It follows that the probability of obtaining a value of £ between 
£ and £+d£ for any value of s is 

o 

Hence, by (6), we obtain the results 

n In— 2n—4 5.3 /f , \ 

i' 1 ” ~ " ) - A jt 1 1 A 2 ) (n even) j 

We note the very remarkable fact that the formula (9) does not 
involve the unknown constant a : hence its practical importance. 
Now write £ — tan 0. 



Chap.IX,§2 DETERMINATION OF A POPULATION 189 

Then, from (9), the probability of obtaining a value of £ lying 
between £j and — £ 1? say, is 


1 n—2 n—4t 
2n—3n—5 


tan-'(Ct) 

COS n “ 2 0 dd , 

tan-K-Ci) 


53 r 
4*2 J 


or 


P = 


1 n—2n —4 


7T n—3 n—5 


tan- l (C») 


/ 


1 

t-n-H-Ci) 


cos n ~ 2 0 dd, 


according as n is odd or even. 

For further information on this subject, reference should be 
made to ‘Student’, Biometrika (1908), and R. A. Fisher, Bio - 
metrika (1914-15). 

Using the above results we can construct a twofold table 
from which the significance of a variation of £ between ±£1 

_ yYl\ 

i.e. of —-—I can be determined for a given value of n. For 
use in practice, Fisher has found it convenient to replace 


£ by t = £Vn = -- m Vn. The values of t , P, and n are given in 

Table IV of Fisher’s Statistical Methods for Research Workers , 
where it is to be noted that the n there used is less by unity 
than that taken above, and that m is assumed to be zero. 


Other Tests for Significance 

In the investigation given on p. 169 we have measured the 
significance of a pair of extractions by comparing the proba¬ 
bility of obtaining such a pair with that of obtaining the ‘most 
likely’ pair. However, this is by no means the only method of 
estimating significance: consider, for example, the following 
problem.! 

Suppose that we have a population of black and white balls 
in an unknown proportion p : 1, and that we draw from it two 
samples, each consisting of 6 balls, which together contain 
8 black balls. Thus, if the first sample contains r black balls, 


Black 

White 

r 

6—r 

8—r 

r—2 


f Irwin, Metron , 12 (1935), 73. 
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the second will contain 8—r black balls. The probability of 
obtaining such a pair of samples is 

*Q rP T ( 1 ~P) 6 “ r 6 tf 8 -r *> 8 - r ( 1 -V) r ~\ 


where r may assume all integral values from 2 to 6. 

In the accompanying scheme we give the values assumed by 
the function P(r) = «q^ 


as r varies from 2 to 6. 


r 1 

P(r) 

2 

15 

3 

120 

4 

225 

5 

120 

6 

15 


It follows that the probability of obtaining a table for which 
r = 2 is « 

15/ y P{r) = 15/495 = 1/33. 

The probability of obtaining an equally probable or less prob¬ 
able table is 30/495 = 2/33. 

The same method may be employed when the two samples 
extracted are not of equal size. Thus, suppose that the first 
sample contains a fixed number a-\-b of balls, and that the 
second contains a fixed number c+d, while the two samples 
together always contain a+c black balls. In the table shown, 


Black 

White 

a + b—r 

r 

c —6-f-r 

6 + rf— r 


the number r evidently cannot exceed the lesser of a +6 and 
c-fd. The probability of obtaining such a table is 

a +b Cr ^ d G b+d J^^C a ^. 

Consider, for instance, the following data, which give the 
number of cases of measles prevented and not prevented by the 
use of serum in each of two different schools. 



Chap. IX, §2 DETERMINATION OF A POPULATION 


191 



Prevented 

Not prevented 

School I 

26 

2 

School II 

61 

2 

Totals 

87 

4 


The possible tables which may be enumerated are represented 
by the scheme 


28 — r 

r 

59-f-r 

4 — r 


in which r may assume the values 0, 1, 2, 3, 4. We now calculate 
the corresponding values of the function 

P(r) = 28 C/ 3 C 4 _ r . 

These are shown in the accompanying table. 


r 

P(r) 

0 

595,665 

1 

1,111,908 

2 

738,234 

3 

206,388 

4 

20,475 


from which it follows that 

2 P(r) *= 2,672,670. 

r-0 

Hence, the probability of obtaining a table as improbable as 
or less probable than the observed one (for which r = 2) is 


P(0)+P(2) + P(3)+P(4) 

fP(r) 

r=0 


1,560,762 

2,672,670 


0-584. 


As a further illustration of how the significance of samples 
drawn from a population can be reduced to a comparison of 
relative probabilities we examine the following problem, f 
Two populations each possess a certain quality in unknown 
proportions p 1 and p 2 . Samples of magnitude N are drawn from 
each and found to contain x x and x 2 respectively of the quality 
in question. We inquire what is the significance relative to the 


t See Jeffreys, Proc. Camb . Phil. Soc . 31 (1935), 203. 
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possible values of p x and p 2 to be attached to the difference 
x x —x 2 found from the two samples. It is clear that the question 
becomes important only when x 1 —x 2 is small compared with N. 

Accordingly we examine the respective probabilities that 
samples x x and x 2 will be drawn from the two populations on 
the assumptions 

(1) that p x ^p 2l 

(2) that p x = p 2 . 

In the case (1) the probability of x x and x 2 successes in N trials 
each is, by Bernoulli’s Theorem, 

A = N C Xi p?( 1 - Pl ) N ~*' X N C Xt p?(l-p 2 ) N -*' 


N\ 2 


-,Pi '( 1 -Pl) N ~ T 'P2 ‘( 1 —P2) N ~ T ‘- 


x x \x 2 \ (N—x x )\ (N—x 2 )\ 

Now, prior to the drawing of the sample, p x and p 2 may have 
any values in the range 


0 < (P V P 2 ) < 

all, it will be assumed, with equal probability. 

Hence, on this basis the probability of drawing two samples 
x x and x 2 is 1 1 

/ j A dp x dp 2 . 

0 0 


Now 

Hence 


j 


dp p m (l— p) n 


mini 

(m+n-f 1)!* 


i i 

J Pi l (l—Pi) N ~ Xt dpi J pl'{l-p z ) N ~ x ' dp 2 
0 0 

~"" (N+i)i(N+iy. 


In the case (2), where p 1 = p 2 , the probability of drawing 
samples x x and x 2 is 

B = N C Xi pfi 1 x "C Xt pl>( 1 - Pl ) N - x ' 


(N\) 2 


x^.Xj}. (IV—a^)! (N—x 2 )\ 


pfAxt^l—pjW-Xt-x, 


Again p may be assumed to range with equal prpbability 
between 0 and 1. 
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Hence the probability of drawing the two samples in this 
case is 1 

J Bdp v 


Now 


x 


(2N+iy. 


In A and B the coefficients not involving p x and p 2 are identical. 
Hence we obtain the result: 


Probability of (x l9 x 2 ) arising when p x — p 2 
Probability of (x v x 2 ) arising when p 1 ^ p 2 

(^ 1 +^ 2 )! (2N— x t — a; 2 )! {(N+l)!} 2 

(2A+1)! x x \x 2 \ (N—x^l (N—x 2 )\* 

Assuming that N, x v and x 2 are all large numbers, by using 
Stirling’s theorem we can approximate to this ratio; it becomes 

_ Nl _ ex \ N{x t -x 2 Y \ 

\I{AXi+*2)( 2N — x i ~* 2 )} eXP l (x 1 +x 2 )(2A r —Xi—x 2 )j‘ 

The problem of discriminating between the values oip for the 

x x 

two populations arises only when is small. For, by the 

method of Maximum Likelihood, x x /N and x 2 /N are the values 
of p x and^p 2 f° r which x 1 and x 2 are the most probable samples. 
Accordingly let us write 



and 


x L x 2 * 

= **■ 


Then, finally, we may say that the relative likelihood equals 

Probability of drawing {x v x 2 ) when the populations are identical 
Probability of drawing (x v x 2 ) when the populations are different 

= = £“*<-*•**>. 

N 


where 


L 2 


4p(l—p) 

If d, the actual difference between the two successful drawings 


42«0 
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and x 2 , is constant for a population of increasing size, then 
clearly L 2 Sp 2 does not depend on N and the exponential term 
remains constant. 

Thus, for a given difference x 2 —x 1 between the two readings, 
the relative likelihood that the two populations do not differ 
increases directly with L , i.e. with N*. When p = £, L = NK 

Ex. 1. For a given difference x 1 —x 2 in the samples of given size N , 
find the probability p for which the relative likelihood, that the popula¬ 
tions are identical, is a maximum. 

Ex. 2. Suppose that two samples, 100 each in number, are drawn 
from two bags containing black and white balls with the following 
results: 


We have 



Total 

| Black 

White 

First sample 

100 ! 

41 

59 

Second sample 

100 i 

49 

51 


N = 100, 



/41-f49\ 
\ 100 / 


0-45, 


hp — 


49-41 

100 


008, 


Thus 


L 2 = -——- --- =* 100, approx. 
4x0-45x0-55 11 

L = 10. 


Hence 


Probability of identity 10 AQ2 . 10 

Probability ofdifference = ^^(-100x0-08^) = _exp(-0-64). 


i.e. it is approximately three times more probable that the two popula 
tions are identical than that they are different. 


EXAMPLES ON CHAPTER IX 

Ex. 1. The probability of landing a shot within the annulus of radii 
2h 

r and r-f dr on a target is -- exp( ~h 2 r 2 ) dr. A thousand shots are fired 
V77 

at the target and 500 are found to lie within 1 ft. of the centre. What 
is the number of shots expected to lie within 3 in. of the centre, and 
what is the least distance from the centre within which one shot is likely 
to be found ? 

Taking the unit of length as 1 ft., we have, by hypothesis, 

“j - !, 

0 
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or Erf ft =3 0*5. 

Thus ft = 0*48. 

Hence.the number of shots likely to be found within 3 in. of the 
centre is j 

-**■ dr . 


1,000ft 

V77 




The least distance r from the centre within which one shot is likely to 
be found is given by f 

1,000ft J ^ 


V7 T 


e-h'r'dr = 1 . 


Ex. 2. The Method of Least Squares. On p. 169 we have derived this 
method from the normal law; the same result rnay, however, bo obtained, 
without such an assumption, by introducing the concept of weight. 
Suppose that x , y , z... are n numbers and that L l • a l x-\-b l y-\-c l z -\-...» 
L 2 = a 2 x+b 2 y-\-c 2 z-\-L s = a 8 x+b 8 y+c 8 z + ... are s( > ?i) linear 
functions of x , y... with giv r en coefficients, for which we have the esti¬ 
mated values u lt u 2 ,..., u 8 . Then for the expression 

L A^ L^ -f-A 2 L 2 -\-... -j-Aj L s , 
where the A\s are constant, we shall have the estimate 
AiMi-hAgWo+.-.-l A 8 u 8 . 

If we choose the A\s so that 

A{-f A 2 u 2 f ...-j-A, a 8 — 0, X l b l + \ 2 b 2 -\- ...~\-\ 8 b 8 0, etc., 

then L will reduce to x , in which case A 1 u 1 -hA 2 w 2 'l ••• + A a w a is an 
estimated value for x. 

We now define the weight W of L y for any set of values of A lt A 2 ,..., 
by the expression 


JL^ 

W 


= a?+a?+-4-a;. 


Further, we assume that the best estimate for x is that for which W is 
a maximum. This gives us the condition A x dA t f-A 2 d \ 2 -\-... -f \ 8 d\ 3 = 0. 

If we solve the equation so obtained, using the method of undeter¬ 
mined multipliers, we find for x the value that would have been derived 
from the Method of Least Squares as formerly explained. 

For further details, as well as for justification of the present assump¬ 
tions, the reader may consult Whittaker and Robinson, The Calculus 
of Observations , § 115. 

Ex. 3. The speed of a train is recorded every second by an instrument 
which in reality gives the average reading over the previous T seconds. 
If the recorded speed u(t) is found to follow the formula 

u(t) = at 2 + bt + c, 
determine the true speed v(t). 

4260 


o2 
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Here 


u u 

I — ^ J* u(£-}-*r) dx = J* e xD v(t) dx, where D ~ 

— T -T 

1 [e xD }° 1 

-rlif 

1 / T 2 /J 2 T 3 /) 3 \ 

“isKTr+^r-^ 


d 


Hence 


«)(<) = (i- 

= ( i + 


“ 2 “ + 

6 

TD 
' 2 + 

T 2 D 2 

6 

TD 

T 2 D 2 


1 o + 


...)->«<(<) 
- •)«(<) 


rp rpi 

= u{t)+-u'(t) + -— 

/ /p\ T a 

- + 2 / 24 U ^^’ a PP roxiinatol y- 


Hence 


rp rp 2 

v(t) = a^ + 6 ^ 4 - c-f— ( 2 aM- 6 )+j 2 2a 
«T 2 /)T 

= a^(6hT)H~-4-y+c, 


Ex. 4. Show that, if u(t) is the recorded measurement at time t , where 
in fact it is the average over a period 2 T lying evenly about time t, then 
the true measurement v(t) is given by 

v(t) = «W-|3V(i) + J |jT 4 ttfr(l)... 

-- u(t)~\T 2 k 2 u(t)+\T*k*u(t) + zl 1 j;k A u(t ).... 


[> 0 

0 

0 

0 ( 

1 

- 1-5 

1 

1-8 

1 

18 

1 

1-6 - 

- 1-8 

2 1 

1-9 

1-6 - 

- 1-8 

2-2 

20 

1*7 - 

- 16 

I 

20 

I 

22 

i 

1*7 ~ 

1 


Ex. 5. In a square lake depth sound¬ 
ings are taken from a boat at a series of 
points forming tho comers of the 25 
squares into which tho surface of the 
lake is divided. The errors in placing 
the boat in each position for sounding 
are given by the law Ae~ r *, where r is 
the accidental deviation from the true 
position. If the figures in the diagram 
are the readings obtained, find tho true 
distribution of depth. 


0 0 0 0 0 0 
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Ex. 6. Show that, if different sets of observations each satisfy a 
h 

Gaussian law of the form where the h *\« may adopt any values 

\7T 

occurring with a probability given by ~ e~ h * a * f then the law of distribu- 

V7T 

tion of the x's is of the form • 

2A 3 1 

Ex. 7. For a probability distribution of the type* — —— -^ 2 y 2 , find the 
mean value of x and x 2 . 



APPENDIX 


Erf a: 


-£/■ 


dt. 


X 

Erf x 

X 

Erf a: 

X 

Erf a: 

0-02 

002256 

1-02 

085084 

2-02 

099572 

004 

004511 

104 

085865 

2-04 

099609 

006 

006762 

1-06 

086614 

2-06 

099642 

006 

009008 

1-08 

087333 

2-08 

099673 

010 

011246 

MO 

088021 

2-10 

099702 

012 

013476 

M2 

088679 

2-12 

099728 

014 

015695 

1-14 

089308 

2-14 

0-99753 

016 

017901 

1-16 

089910 

2-16 

099775 

018 

0-20093 

1-18 

090484 

2-18 

099795 

020 

022270 

1-20 

091031 

2-20 

099814 

022 

024430 

1-22 

091553 

2-22 

099831 

024 

026570 

1-24 

092051 

2-24 

099846 

026 

028690 

1-26 

092524 

2-26 

099861 

028 

030788 

1-28 

092973 

2-28 

099874 

030 

032863 

1-30 

0-93401 

2-30 

099886 

032 

034913 

1-32 

0-93807 

2-32 

099897 

034 

036936 

1-34 

094191 

2-34 

099906 

036 

038933 

1-36 

094556 

2-36 

0-99915 

038 

040901 

1-38 

0-94902 

2-38 

0-99924 

040 

042839 

1-40 

0-95229 

2-40 

0-99931 

042 

0-44747 

1-42 

095538 

2-42 

0-99938 

044 

046623 

1-44 

0-95830 

2-44 

099941 

046 

048466 

1-46 

096105 

2-46 

099950 

048 

050275 

1-48 

096365 

2-48 

0-99955 

050 

0-52060 

1-50 

096611 

2-50 

099959 

052 

053790 

1-52 

0-96841 

2-52 

099963 

0-54 

0-55494 

1-54 

097059 

2-54 

099967 

056 

0-57162 

1-56 

0-97263 

2-56 

099971 

058 

058792 

1-58 

0-97455 

2-58 

0-99974 

060 

060386 

1-60 

097635 

2-60 

099976 

0-62 

061941 

1-62 

0-97804 

2-62 

0-99979 

064 

063459 

1-64 

097962 

2-64 

0-99981 

066 

064938 

1-66 

0-98110 

2-66 

099983 

0*68 

066378 

1-68 

0-98249 

2-68 

0-99985 

070 

067780 

1*70 

0-98379 

2-70 

0-99987 

072 

069143 

1-72 

098500 

2-72 

099988 

074 

070468 

1*74 

098613 

2-74 

099989 

076 

071754 

1-76 

098719 

2-76 

099991 

078 

073001 

1-78 

098817 

2-78 

0-99992 

080 

074210 

1-80 

098900 

2-80 

099992 

082 

075381 

1-82 

0-98994 

2-82 

0-99993 

084 

076514 

1-84 

099074 

2-84 

0-99994 

086 

077610 

1-86 

099147 

2-86 

0-99995 

0-88 

0-78669 

1-88 

099216 

2-88 

099995 

090 

079691 

1-90 

099279 

2-90 

0-99996 

092 

080677 

1-92 

099338 

2-92 

099996 

094 

081627 

1-94 

099392 

2-94 

099997 

096 

082542 

1-96 

099443 

2-96 

099997 

098 

083423 

1-98 

099489 

2-98 

099997 

1-00 

084270 

2-00 

D-99532 

i 

3-00 

3-123 

3-459 

099998 

0999990 

0-999999 
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